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A NOVEL BAP28 GENE AND PROTEIN 

5 Related Applications 

The present application claims priority toU.S. Provisional Patent Application Serial No. 
60/141,323, filed June 25, 1999 and U.S. Provisional Patent Application Serial No. 60/176,880, filed 
January 18, 2000, the disclosures of which are incorporated herein by reference in their entireties. 

FIELD OF THE INVENTION 

1 0 The present invention is directed to polynucleotides encoding a human BAP28 polypeptide 

as well as a regulatory regions located at the 5'- and 3'-ends of said coding region. The invention also 
concerns polypeptides encoded by the BAP28 gene. The invention also deals with antibodies directed 
specifically against such polypeptides that are useful as diagnostic reagents. The invention further 
encompasses biallelic markers of the BAP28 gene useful in genetic analysis, and more particularly 

1 5 associated with prostate cancer and useful in diagnosis. 

BACKGROUND OF THE INVENTION 
Prostate Cancer 

The incidence of prostate cancer has dramatically increased over the last decades. It 
averages 30-50/100,000 males in Western European countries as well as within the US White male 
20 population. In these countries, it has recently become the most commonly diagnosed malignancy, 
being one of every four cancers diagnosed in American males. Prostate cancer's incidence is very 
much population specific, since it varies from 2/100,000 in China, to over 80/100,000 among African- 
American males. 

In France, the incidence of prostate cancer is 35/100,000 males and it is increasing by 
25 10/100,000 per decade. Mortality due to prostate cancer is also growing accordingly. It is the second 
cause of cancer death among French males, and the first one among French males aged over 70. This 
makes prostate cancer a serious burden in terms of public health. 

Prostate cancer is a latent disease. Many men carry prostate cancer cells without overt signs 
of disease. Autopsies of individuals dying of other causes show prostate cancer cells in 30 % of men 
30 at age 50 and in 60 % of men at age 80. Furthermore, prostate cancer can take up to 10 years to kill a 
patient after the initial diagnosis. 

The progression of the disease usually goes from a well-defined mass within the prostate to a 
breakdown and invasion of the lateral margins of the prostate, followed by metastasis to regional 
lymph nodes, and metastasis to the bone marrow. Cancer metastasis to bone is common and often 
35 associated with uncontrollable pain. 
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Unfortunately, in 80 % of cases, diagnosis of prostate cancer is established when the disease 
has already metastasized to the bones. Of special interest is the observation that prostate cancers 
frequently grow more rapidly in sites of metastasis than within the prostate itself. 

Early-stage diagnosis of prostate cancer mainly relies today on Prostate Specific Antigen 
5 (PSA) dosage, and allows the detection of prostate cancer seven years before clinical symptoms 
become apparent. The effectiveness of PSA dosage diagnosis is however limited, due to its inability 
to discriminate between malignant and non-malignant affections of the organ and because not all 
prostate cancers give rise to an elevated serum PSA concentration. Furthermore, PSA dosage and 
other currently available approaches such as physical examination, tissue biopsy and bone scans are of 
10 limited value in predicting disease progression. 

Therefore, there is a strong need for a reliable diagnostic procedure which would enable a 
more systematic early-stage prostate cancer prognosis. 

Although an early-stage prostate cancer prognosis is important, the possibility of measuring 
the period of time during which treatment can be deferred is also interesting as currently available 
1 5 medicaments are expensive and generate important adverse effects. However, the aggressiveness of 
prostate tumors varies widely. Some tumors are relatively aggressive, doubling every six months 
whereas others are slow-growing, doubling once every five years. In fact, the majority of prostate 
cancers grows relatively slowly and never becomes clinically manifest. Very often, affected patients 
are among the elderly and die from another disease before prostate cancer actually develops. Thus, a 
20 significant question in treating prostate carcinoma is how to discriminate between tumors that will 
progress and those that will not progress during the expected lifetime of the patient. 

Hence, there is also a strong need for detection means which may be used to evaluate the 
aggressiveness or the development potential of prostate cancer tumors once diagnosed. 

Furthermore, at the present time, there is no means to predict prostate cancer susceptibility. 
25 It would also be very beneficial to detect individual susceptibility to prostate cancer. This could allow 
preventive treatment and a careful follow up of the development of the tumor. 

A further consequence of the slow growth rate of prostate cancer is that few cancer cells are 
actively dividing at any one time, rendering prostate cancer generally resistant to radiation and 
chemotherapy. Surgery is the mainstay of treatment but it is largely ineffective and removes the 
30 ejaculatory ducts, resulting in impotence. Oral oestrogens and luteinizing releasing hormone analogs 
are also used for treatment of prostate cancer. These hormonal treatments provide marked 
improvement for many patients, but they only provide temporary relief. Indeed, most of these cancers 
soon relapse with the development of hormone-resistant tumor cells and the oestrogen treatment can 
lead to serious cardiovascular complications. Consequently, there is a strong need for preventive and 
35 curative treatment of prostate cancer. 

Efficacy/tolerance prognosis could be precious in prostate cancer therapy. Indeed, hormonal 
therapy, the main treatment currently available, presents important side effects. The use of 
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chemotherapy is limited because of the small number of patients with chemosensitive tumors. 
Furthermore the age profile of the prostate cancer patient and intolerance to chemotherapy make the 
systematic use of this treatment very difficult. 

Therefore, a valuable assessment of the eventual efficacy of a medicament to be 
5 administered to a prostate cancer patent as well as the patent's eventual tolerance to it may permit to 
enhance the benefit/risk ratio of prostate cancer treatment. 

BAP28 

Bowcock et al. (1998) conducted studies to identify proteins interacting with the the first 304 
amino terminal amino acid residues of breast cancer related gene, BRCA1 . Bowcock et al. thereby 
10 identified a BAP28 cDNA encoding a 5 15 amino acid protein associating with BRCA1 in a yeast two- 
hybrid screen, but whose association with BRCA1 could not be confirmed in a two-hybrid screen in 
mammalian cells. 

SUMMARY OF THE INVENTION 

The present invention pertains to nucleic acid molecules comprising the genomic sequence 
15 of a novel human BAP28 gene and BAP28 protein. The BAP28 genomic sequence comprises 

regulatory sequences located upstream and downstream of the transcribed portion of said gene, these 
regulatory sequences being also part of the invention. 

The invention also deals with complete cDNA sequences encoding the BAP28 protein, as 
well as with the corresponding translation product. 
20 Oligonucleotide probes or primers hybridizing specifically with a BAP28 genomic or cDNA 

sequence are also part of the present invention, as well as DNA amplification and detection methods 
using said primers and probes. 

A further object of the invention consists of recombinant vectors comprising any of the 
nucleic acid sequences described herein, and in particular of recombinant vectors comprising a BAP28 
25 regulatory sequence or a sequence encoding a BAP28 protein, as well as of cell hosts and transgenic 
non human animals comprising said nucleic acid sequences or recombinant vectors. 

The invention is also directed to BAP28 polymorphisms and BAP28-related biallelic markers 
as well as use of the of BAP28-related biallelic markers in establishing genetic associations with 
disease. BAP28-related biallelic markers can be used for diagnosis, staging, prognosis and monitoring 
30 of disease, and the efficient design and evaluation of suitable therapeutic solutions including 
individualized strategies for optimizing drug usage, and screening of potential new medicament 
candidates. More particularly, the invention concerns an association between BAP28-related biallelic 
markers and prostate cancer. 

Finally, the invention is directed to methods for the screening of substances or molecules that 
35 inhibit the expression of BAP28, as well as with methods for the screening of substances or molecules 
that interact with a BAP28 polypeptide or that modulate the activity of a BAP28 polypeptide. 
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BRIEF DESCRIPTION OF THE DRAWINGS 

Figure 1 is a diagram showing the genomic structure of the genes BAP28 and PCTA-1. The 
arrow represent the DNA with the 5' to 3 ' direction. The boxes represent the exons. 

Figure 2 is a diagram showing some alternative cDNA forms of the PCTA-1 gene. 
5 Figure 3 is an alignment of the human BAP28 protein H with its homologues from 

Drosophila melanogaster (ORF from AE003615) D, Arabidopsis thaliana (AAF63640) A, 
Schizosaccahromyces pombe (060179) S, Caenorhabditis elegans (Q23495) C, and Saccharomyces 
cerevisiae (YJK9_YEAST) Y. In C terminal part of the protein alignment, a box indicates the position 
of a conserved HEATREPEAT which is described to be involved in protein-protein interaction. For 
10 Drosophila melanogaster, the sequence AE003615 decribes a gene CG10805 with 6 exons. A new 
analysis showed that the exons 2, 3, 4, 5, and 6 present an holomoly with BAP28. Therefore, a new 
cDNA has been generated consisting with 21 bp upstream to exon 2, exon 2, intron 2, exons 3, 4, 5, 
and 6. This cDNA encodes a protein of 2096 amino acids which is described as D in the Figure 3. 

Figure 4 is an alignment of the human BAP28 protein and 3 protein segments from 
15 Tetraodon nigroviridis, likely part of the same protein. The following sequences from Genbanl have 
been contigated in order to generate 3 segments of the genomic sequence of Tetraodon (CNS01RV3 + 
CNS03LT9 ->tetraodon3 ; CNS02AXF + CNS03INT -> tetraodon 1 ; CNS02AXG + CNS01RV4 + 
CNS03LTA + CNS03INS -> tetraodon2). The 3 protein fragments which are similar to BAP28 have 
been found in these contigated regions. Furthermore, the exons encoding the 3 protein segments have 
20 the same size and the same structure in human BAP28 and in Tetraodon. The amino acid sequences 
encoding by these exons have been aligned with the human BAP28 protein. 

Figure 5 is a diagram showing the allelic association analysis in chromosomic region lq43. 

Figure 6 is a diagram showing the genotypic association analysis in chromosomic region lq43. 

Figure 7 is a table demonstrating the results of a haplotype association analysis between 
25 prostate cancer cases and haplotypes comprising BAP28-related biallelic markers. Figure 7A a 

presents the results for the two-marker haplotypes. Figure 7B presents the results for the three-marker 
haplotypes. 

Figure 8 is a table demonstrating the results of a haplotype association analysis between 
familial prostate cancer cases and haplotypes comprising iL4P2S-related biallelic markers. Figure 8A a 
30 presents the results for the two-marker haplotypes. Figure 8B presents the results for the three-marker 
haplotypes. 

Figure 9 is a table demonstrating the results of a haplotype association analysis between early 
onset familial prostate cancer cases (less than 65 years old) and haplotypes comprising BAP28-re\ated 
biallelic markers. Figure 9A a presents the results for the two-marker haplotypes. Figure 9B presents 
3 5 the results for the three-marker haplotypes. 

Figure 10 is a table demonstrating the results of a haplotype association analysis between 
sporadic prostate cancer cases and haplotypes comprising BAP28-related biallelic markers. Figure 10A 
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a presents the results for the two-marker haplotypes. Figure 10B presents the results for the three- 
marker haplotypes. 

Figure 1 1 is a table demonstrating the results of a haplotype association analysis between 
informative sporadic prostate cancer cases and haplotypes comprising BAP 2 8-related biallelic 
5 markers. Figure 1 1A a presents the results for the two-marker haplotypes. Figure 1 IB presents the 
results for the three-marker haplotypes. 

Figuresl2A and 12B are tables summarizing the results of haplotype frequency analyses 
between prostate cancer and three preferred haplotypes. 

Figure 13 is a half-tome reproduction of the gels showing the tissular specificity of the 

10 BAP28 expression, more particularly the segment comprising the exons 43 to A. Figure 13 A : Wells 1 
and 13: Molecular weight markers X - 300ng ; Well 2 : Mix PCR water = negative control ; Well 3 : 
Marathon Ready cDNA Human Testis : positive Tissue (CLONTECH Lot N°91 10553) ; Well 4 : 
Marathon Ready cDNA Human Brain : negative Tissue ; Well 5 : Marathon Ready cDNA Human 
Cerebellum : negative Tissue ; Well 6 : Marathon Ready cDNA Human Cerebral Cortex : negative 

15 Tissue ; Well 7 : Marathon Ready cDNA Human Hippocampus : positive Tissue (CLONTECH Lot 
N°9040528) ; Well 8 : Marathon Ready cDNA Human Hypothalamus : negative Tissue ; Well 9 : 
Marathon Ready cDNA Human Fetal Kidney : negative Tissue ; Well 10 : Marathon Ready cDNA 
Human Thyroid : negative Tissue ; Well 1 1 : Marathon Ready cDNA Human Bone Marrow : negative 
Tissue ; Well 1 1 : Marathon Ready cDNA Human Leukemia, promyelocytic HL60 : negative Tissue. 

20 Figure 13 B : Wells 1 and 7: Molecular weight markers X - 300ng ; Well 2 : Marathon Ready cDNA 
Human Leukemia, lymphoblastic MOLT4 : negative Tissue ;Well 3 : Marathon Ready cDNA Human 
Leukemia, chronic myelogenous K-562 : positive Tissue (CLONTECH Lot N°9 120565) ; Well 4 : 
Marathon Ready cDNA Human Fetal Liver : negative Tissue ; Well 5 : Marathon Ready cDNA 
Human Stomach : negative Tissue ; Well 6 : Marathon Ready cDNA Human Prostate : negative 

25 Tissue. Figure 13C : Wells 1 and 13 : Molecular weight markers X - 300ng ; Well 2 : cDNA Human 
Testis : negative Tissue ; Well 3 : cDNA Human Cerebellum : positive Tissue (RNA PolyA+ 
CLONTECH -Lot N°8070047 - Ref Cat:6543-1) ; Well 4 : cDNA Human Corpus Callosum : negative 
Tissue ; Well 5 : cDNA Human Substantia Nigra : positive Tissue (RNA PolyA+ CLONTECH - Lot 
N°8090745 - Ref Cat:6580-1) ; Well 6 : cDNA Human Amygdala : negative Tissue ; Well 7 : cDNA 

30 Human Thalamus : positive Tissue (RNA PolyA+ CLONTECH -Lot N°903 1 13 1 - Ref Cat:65 82-1) ; 
Well 8 : cDNA Human Hippocampus : positive Tissue (RNA PolyA+ CLONTECH -Lot N°8040059 - 
Ref Cat:6578-1) ; Well 9 : cDNA Human Caudate Nucleus : positive Tissue (RNA PolyA+ 
CLONTECH - Lot N°6120286 - Ref Cat:6575-1) ; Well 10 : cDNA Human Fetal Brain : negative 
Tissue ; Well 1 1 : cDNA Human Skeletal Muscle : negative Tissue ; Well 12 : cDNA Human Lung : 

35 negative Tissue. Figure 13 D : Wells 1 and 13 : Molecular weight markers X - 300ng ; Well 2 : cDNA 
Human Kidney : negative Tissue ; Well 3 : cDNA Human Placenta : negative Tissue ; Well 4 : cDNA 
Human Spleen : negative Tissue ; Well 5 : cDNA Human Fetal Liver : negative Tissue ; Well 6 : 
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cDNA Human Thyroid Gland : negative Tissue ; Well 7 : cDNA Human Leukemia, lymphoblastic : 
negative Tissue ; Well 8 : cDNA Human Spinal Cord : positive Tissue (RNA PolyA+ CLONTECH - 
Lot N°9040709 - Ref Cat:6593-1) ; Well 9 : cDNA Human Pituitary Gland : positive Tissue (RNA 
PolyA+ CLONTECH - LotN°6080167 - Ref Cat: 6 5 84-1) ; Well 10 : cDNA Human Adrenal Gland : 
5 negative Tissue ; Well 1 1 : cDNA Human Trachea : negative Tissue ; Well 12 : cDNA Human 
Leukemia, chronic myelogenous : negative Tissue. Figure 13 E : Wells 1 and 13 : Molecular weight 
markers X - 300ng ; Well 2 : cDNA Human Salivary Gland : negative Tissue ; Well 3 : cDNA Human 
Leukemia, promyelocyte : negative Tissue ; Well 4 : cDNA Human Small Intestine : negative Tissue ; 
Well 5 : cDNA Human Pancreas : negative Tissue ; Well 6 : cDNA Human Stomach : negative Tissue 

1 0 ; Well 7 : cDNA Human Mammary Gland : positive Tissue (RNA PolyA+ CLONTECH - Lot 
N°903 1 125 - Ref Cat:6545-1) ; Well 8 : cDNA Human Bone Marrow : negative Tissue ; Well 9 : 
cDNA Human Thymus : negative Tissue ; Well 10 : cDNA Human Uterus : negative Tissue ; Well 1 1 
: cDNA Human Prostate : negative Tissue ; Well 12 : cDNA Human Prostate : negative Tissue. 
Figure 14 is a block diagram of an exemplary computer system. 

1 5 Figure 15 is a flow diagram illustrating one embodiment of a process 200 for comparing a new 

nucleotide or protein sequence with a database of sequences in order to determine the homology levels 
between the new sequence and the sequences in the database. 

Figure 16 is a flow diagram illustrating one embodiment of a process 250 in a computer for 
determining whether two sequences are homologous. 

20 Figure 1 7 is a flow diagram illustrating one embodiment of an identifier process 300 for 

detecting the presence of a feature in a sequence. 

Brief Description of the sequences provided in the Sequence Listing 

SEQ ID No 1 contains the genomic sequence of the BAP28 gene comprising the exons and 
introns, and the 5' and 3' regulatory regions (respectively the upstream and downstream untranscribed 
25 regions). Furthermore, SEQ ID No 1 also contains the genomic sequence of the PCTA-1 gene. The 
coding strand of PCTA-1 gene is on the opposite of the coding strand of BAP28. 

SEQ ID No 2 contains a first cDNA sequence of the BAP28 gene consisting of the exons 1 to 
45. SEQ ID No 3 contains a second cDNA sequence of the BAP28 gene consisting of the exons 1 to 
44, 45b and A'. SEQ ID No 4 contains a sequence of the BAP28 cDNA segment consisting of the 
30 exons B' and A'. SEQ ID No 5 contains the BAP28 amino acid sequence encoded by the cDNAs of 
SEQ IDNos 2, and 3. 

SEQ ID No 6 contains a first cDNA sequence of the PCTA-1 gene consisting of the exons 0 
to 9. SEQ ID No 7 contains a second cDNA sequence of the PCTA-1 gene consisting of the exons 0, 1, 
2, 3, 4, 5, 6, 6bis, 7, 8, and 9. SEQ ID No 8 contains a third cDNA sequence of the PCTA-1 gene 
35 consisting of the exons 0 to 8, 9bis and 9ter. SEQ ID No 9 contains the sequence of a cDNA fragment 
of the PCTA-1 gene comprising exons C and A. SEQ ID No 10 contains the sequence of a cDNA 
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fragment of the PCTA-1 gene comprising exons B, 0, 1 and 2. SEQ ID No 1 1 contains the sequence of 
a cDNA fragment of the PCTA-1 gene comprising exons A, 1 and 2. SEQ ID No 12 contains the 
sequence of a cDNA fragment of the PCTA-1 gene comprising exons A, D, 0, 1, and 2. SEQ ID No 13 
contains a fourth cDNA sequence of the PCTA-1 gene comprising exons A, 0, 1, 2, 3, 9bis and 9ter. 
5 SEQ ID No 14 contains the PCTA-1 amino acid sequence encoded by the cDNAs of SEQ ID No 6. 
SEQ ID No 15 contains the PCTA-1 amino acid sequence encoded by the cDNAs of SEQ ID No 7. 
SEQ ID No 16 contains the PCTA-1 amino acid sequence encoded by the cDNAs of SEQ ID No 8. 
SEQ ID No 17 contains the PCTA-1 amino acid sequence encoded by the cDNAs of SEQ ID No 13. 

SEQ ID Nos 1 8-3 1 contain the genomic amplicons respectively designated as 99-7177, 99- 

10 7212, 99-7193, 99-7186, 99-7182, 99-1585, 99-1587, 99-13798, 99-1601, 99-13808, 99-13810, 99- 
13790, 99-13809, and 99-1597. 

SEQ ID Nos 31-61 contain the sequence of the following primers : BAP283Ra6283, 
BAP283Ra6324n, BAP28-exALF73 1 1, BAP28-exALF7319n, PCTAexALF12, PCTAexALF13n, 
PCTAexALR60, PCTAexALR 1 2n, PCTAexBLF33, PCTAexBLF 120n, PCTAexBLRHO, 

1 5 PCTAexBLR40n, PCTA5Ra220n, PCTA5Ra230, PCTA_5Ra400, PCTA_5Ran_400, 
PCTA_5Ra_394, PCTA exD5Ra, PCTA exDSRan, PCTA_exC5Ra, PCTA_exC5Ran, 
PCTAex9terLR330, PCTAex9terLR325n, PCTAexCLF120, PCTAexCLF 1 30n, BAP28polyTcourt, 
BAP281LF12.1, BAP28LR6726.1, BAP28LF26SalI and BAP28LR6717SalI, respectively. 

SEQ ID No 62 contains a primer containing the additional PU 5' sequence described further 

20 in Example 2. SEQ ID No 63 contains a primer containing the additional RP 5' sequence described 
further in Example 2. 

In accordance with the regulations relating to Sequence Listings, the following codes have 
been used in the Sequence Listing to indicate the locations of biallelic markers within the sequences 
and to identify each of the alleles present at the polymorphic base. The code "r" in the sequences 

25 indicates that one allele of the polymorphic base is a guanine, while the other allele is an adenine. The 
code "y" in the sequences indicates that one allele of the polymorphic base is a thymine, while the 
other allele is a cytosine. The code "m" in the sequences indicates that one allele of the polymorphic 
base is an adenine, while the other allele is an cytosine. The code "k" in the sequences indicates that 
one allele of the polymorphic base is a guanine, while the other allele is a thymine. The code "s" in the 

30 sequences indicates that one allele of the polymorphic base is a guanine, while the other allele is a 
cytosine. The code "w" in the sequences indicates that one allele of the polymorphic base is an 
adenine, while the other allele is an thymine. The nucleotide code of the original allele for each 
biallelic marker is the following: 



Biallelic marker 


Original allele 


Al 


G 


A2 


C 


A3 


T 


A4 


C 



A5 


C 


A6 


T 


A7 


T 


A8 


G 


A9 


T 
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A10 


Q 


Al 1 


Q 


A12 


A 


A13 


T 


A14 


T 


A15 


A 


A16 


G 


A17 


T 


A18 


T 


A19 


c 


A20 


G 


Biallelic marker 


Original allele 


A21 


G 


A22 


T 


A23 


G 


A24 


G 



A25 


G 


A26 


C 


A27 


A 


A28 


A 


A29 


c 


A30 


A 


A31 


C 


A3 2 


G 


A3 3 


G 


A3 4 


A 


A3 5 


G 


A3 6 


G 


A3 7 


T 


A3 8 


A 


A39 


C 


A40 


C 



In some instances, the polymorphic bases of the biallelic markers alter the identity of an 
amino acids in the encoded polypeptide. This is indicated in the accompanying Sequence Listing by 
use of the feature VARIANT, placement of an Xaa at the position of the polymorphic amino acid, 
and definition of Xaa as the two alternative amino acids. For example if one allele of a biallelic 
marker is the codon CAC, which encodes histidine, while the other allele of the biallelic marker is 
CAA, which encodes glutamine, the Sequence Listing for the encoded polypeptide will contain an 
Xaa at the location of the polymorphic amino acid. In this instance, Xaa would be defined as being 
histidine or glutamine. 

In other instances, Xaa may indicate an amino acid whose identity is unknown because of 
nucleotide sequence ambiguity. In this instance, the feature UNSURE is used, placement of an Xaa at 
the position of the unknown amino acid and definition of Xaa as being any of the 20 amino acids or 
a limited number of amino acids suggested by the genetic code. 



DETAILED DESCRIPTION OF THE INVENTION 

15 The present invention concerns polynucleotides and polypeptides related to the BAP28 

gene. Oligonucleotide probes and primers hybridizing specifically with a genomic or the cDNA 
sequences of BAP28 are also part of the invention. A further object of the invention consists of 
recombinant vectors comprising any of the nucleic acid sequences described in the present 
invention, and in particular recombinant vectors comprising a regulatory region of BAP28 or a 

20 sequence encoding the BAP28 protein, as well as cell hosts comprising said nucleic acid sequences 
or recombinant vectors. The invention also encompasses methods of screening of molecules which 
inhibit the expression of the BAP28 gene or which modulate the activity of, or interact with, the 
BAP28 protein. The invention also deals with antibodies directed specifically against such 
polypeptides that are useful as diagnostic reagents. 
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The invention also concerns BAP28-re\ated biallelic markers which can be used in any 
method of genetic analysis including linkage studies in families, linkage disequilibrium studies in 
populations and association studies of case-control populations. An important aspect of the present 
invention is that some BAP28-related biallelic markers present an association with the prostate 
5 cancer. 

Definitions 

Before describing the invention in greater detail, the following definitions are set forth to 
illustrate and define the meaning and scope of the terms used to describe the invention herein. 

The terms " BAP28 gene ", when used herein, encompasses genomic, mRNA and cDNA 
10 sequences encoding the BAP28 protein, including the untranslated regulatory regions of the genomic 
DNA. 

The term " heterologous protein ", when used herein, is intended to designate any protein or 
polypeptide other than the BAP28 protein. More particularly, the heterologous protein is a 
compound which can be used as a marker in further experiments with a BAP 28 regulatory region. 

1 5 The term " isolated " requires that the material be removed from its original environment (e. 

g., the natural environment if it is naturally occurring). For example, a naturally-occurring 
polynucleotide or polypeptide present in a living animal is not isolated, but the same polynucleotide 
or DNA or polypeptide, separated from some or all of the coexisting materials in the natural system, 
is isolated. Such polynucleotide could be part of a vector and/or such polynucleotide or polypeptide 

20 could be part of a composition, and still be isolated in that the vector or composition is not part of its 
natural environment. 

As used herein, the term " purified " does not require absolute purity; rather, it is intended as 
a relative definition. Purification of starting material or natural material is at least one order of 
magnitude, preferably two or three orders, and more preferably four or five orders of magnitude is 

25 expressly contemplated. As an example, purification from 0.1 % concentration to 10 % 
concentration is two orders of magnitude. 

To illustrate, individual cDNA clones isolated from a cDNA library have been 
conventionally purified to electrophoretic homogeneity. The sequences obtained from these clones 
could not be obtained directly either from the library or from total human DNA. The cDNA clones 

30 are not naturally occurring as such, but rather are obtained via manipulation of a partially purified 
naturally occurring substance (messenger RNA). The conversion of mRNA into a cDNA library 
involves the creation of a synthetic substance (cDNA) and pure individual cDNA clones can be 
isolated from the synthetic library by clonal selection. Thus, creating a cDNA library from 
messenger RNA and subsequently isolating individual clones from that library results in an 

35 approximately 1 0 4 - 1 0 6 fold purification of the native message. 

The term "purified" is further used herein to describe a polypeptide or polynucleotide of 
the invention which has been separated from other compounds including, but not limited to, 
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polypeptides or polynucleotides, carbohydrates, lipids, etc. The term "purified" may be used to 
specify the separation of monomeric polypeptides of the invention from oligomeric forms such as 
homo- or hetero- dimers, trimers, etc. The term "purfied" may also be used to specify the separation 
of covalently closed polynucleotides from linear polynucleotides. A polynucleotide is substantially 
5 pure when at least about 50%, preferably 60 to 75% of a sample exhibits a single polynucleotide 
sequence and conformation (linear versus covalently close). A substantially pure polypeptide or 
polynucleotide typically comprises about 50%, preferably 60 to 90% weight/weight of a polypeptide 
or polynucleotide sample, respectively, more usually about 95%, and preferably is over about 99% 
pure. Polypeptide and polynucleotide purity, or homogeneity, is indicated by a number of means 

10 well known in the art, such as agarose or polyacrylamide gel electrophoresis of a sample, followed 
by visualizing a single band upon staining the gel. For certain purposes higher resolution can be 
provided by using HPLC or other means well known in the art. As an alternative embodiment, 
purification of the polypeptides and polynucleotides of the present invention may be expressed as "at 
least" a percent purity relative to heterologous polypeptides and polynucleotides (DNA, RNA or both). 

15 As a preferred embodiment, the polypeptides and polynucleotides of the present invention are at least; 
10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, 95%, 96%, 96%, 98%, 99%, or 100% pure 
relative to heterologous polypeptides and polynucleotides, respectively. As a further preferred 
embodiment the polypeptides and polynucleotides have a purity ranging from any number, to the 
thousandth position, between 90% and 100% (e.g., a polypeptide or polynucleotide at least 99.995% 

20 pure) relative to either heterologous polypeptides or polynucleotides, respectively, or as a 

weight/weight ratio relative to all compounds and molecules other than those existing in the carrier. 
Each number representing a percent purity, to the thousandth position, may be claimed as individual 
species of purity. 

The term " polypeptide " refers to a polymer of amino acids without regard to the length of 
25 the polymer; thus, peptides, oligopeptides, and proteins are included within the definition of 
polypeptide. This term also does not specify or exclude post-expression modifications of 
polypeptides, for example, polypeptides which include the covalent attachment of glycosyl groups, 
acetyl groups, phosphate groups, lipid groups and the like are expressly encompassed by the term 
polypeptide. Also included within the definition are polypeptides which contain one or more 
30 analogs of an amino acid (including, for example, non-naturally occurring amino acids, amino acids 
which only occur naturally in an unrelated biological system, modified amino acids from 
mammalian systems etc.), polypeptides with substituted linkages, as well as other modifications 
known in the art, both naturally occurring and non-naturally occurring. 

The term " recombinant polypeptide " is used herein to refer to polypeptides that have been 
35 artificially designed and which comprise at least two polypeptide sequences that are not found as 
contiguous polypeptide sequences in their initial natural environment, or to refer to polypeptides 
which have been expressed from a recombinant polynucleotide. 
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As used herein, the term " non-human animal " refers to any non-human vertebrate, birds 
and more usually mammals, preferably primates, farm animals such as swine, goats, sheep, donkeys, 
and horses, rabbits or rodents, more preferably rats or mice. As used herein, the term "animal" is 
used to refer to any vertebrate, preferable a mammal. Both the terms "animal" and "mammal" 
5 expressly embrace human subjects unless preceded with the term "non-human". 

As used herein, the term " antibody " refers to a polypeptide or group of polypeptides which 
are comprised of at least one binding domain, where an antibody binding domain is formed from the 
folding of variable domains of an antibody molecule to form three-dimensional binding spaces with 
an internal surface shape and charge distribution complementary to the features of an antigenic 
1 0 determinant of an antigen, which allows an immunological reaction with the antigen. Antibodies 
include recombinant proteins comprising the binding domains, as wells as fragments, including Fab, 
Fab', F(ab) 2 , and F(ab') 2 fragments. 

As used herein, an " antigenic determinant " is the portion of an antigen molecule, in this 
case a BAP28 polypeptide, that determines the specificity of the antigen-antibody reaction. An 
1 5 "epitope" refers to an antigenic determinant of a polypeptide. An epitope can comprise as few as 3 
amino acids in a spatial conformation which is unique to the epitope. Generally an epitope consists 
of at least 6 such amino acids, and more usually at least 8-10 such amino acids. Methods for 
determining the amino acids which make up an epitope include x-ray crystallography, 2-dimensional 
nuclear magnetic resonance, and epitope mapping e.g. the Pepscan method described by Geysen et 
20 al. 1984; PCT Publication No WO 84/03564; and PCT Publication No WO 84/03506. 

Throughout the present specification, the expression " nucleotide sequence " may be 
employed to designate indifferently a polynucleotide or a nucleic acid. More precisely, the 
expression "nucleotide sequence" encompasses the nucleic material itself and is thus not restricted to 
the sequence information (i.e. the succession of letters chosen among the four base letters) that 
25 biochemically characterizes a specific DNA or RNA molecule. 

As used interchangeably herein, the terms " nucleic acids ", "oligonucleotides", and 
"polynucleotides" include RNA, DNA, or RNA/DNA hybrid sequences of more than one nucleotide 
in either single chain or duplex form. The term "nucleotide" as used herein as an adjective to 
describe molecules comprising RNA, DNA, or RNA/DNA hybrid sequences of any length in single- 
30 stranded or duplex form. The term "nucleotide" is also used herein as a noun to refer to individual 
nucleotides or varieties of nucleotides, meaning a molecule, or individual unit in a larger nucleic 
acid molecule, comprising a purine or pyrimidine, a ribose or deoxyribose sugar moiety, and a 
phosphate group, or phosphodiester linkage in the case of nucleotides within an oligonucleotide or 
polynucleotide. Although the term "nucleotide" is also used herein to encompass "modified 
35 nucleotides" which comprise at least one modifications (a) an alternative linking group, (b) an 
analogous form of purine, (c) an analogous form of pyrimidine, or (d) an analogous sugar, for 
examples of analogous linking groups, purine, pyrimidines, and sugars see for example PCT 

11 



GENSET.063 AUS 

publication No WO 95/04064. The polynucleotide sequences of the invention may be prepared by 
any known method, including synthetic, recombinant, ex vivo generation, or a combination thereof, 
as well as utilizing any purification methods known in the art. 

A sequence which is " operably linked " to a regulatory sequence such as a promoter means 

5 that said regulatory element is in the correct location and orientation in relation to the nucleic acid to 
control RNA polymerase initiation and expression of the nucleic acid of interest. 

As used herein, the term "operably linked" refers to a linkage of polynucleotide elements 
in a functional relationship. For instance, a promoter or enhancer is operably linked to a coding 
sequence if it affects the transcription of the coding sequence. 
10 The terms "trait" and "phenotype" are used interchangeably herein and refer to any visible, 

detectable or otherwise measurable property of an organism such as symptoms of, or susceptibility 
to a disease for example. Typically the terms "trait" or "phenotype" are used herein to refer to 
symptoms of, or susceptibility to a disease, a beneficial response to or side effects related to a 
treatment. Preferably, said trait can be, without to be limited to, cancers, developmental diseases, 

5 and neurological diseases. More preferably, the term "trait" or "phenotype", when used herein, 
encompasses, but is not limited to, prostate cancer, an early onset of prostate cancer, a beneficial 
response to or side effects related to treatment or a vaccination against prostate cancer, a 
susceptibility to prostate cancer, the level of aggressiveness of prostate cancer tumors. 

The term " allele " is used herein to refer to variants of a nucleotide sequence. A biallelic 

»0 polymorphism has two forms. Typically the first identified allele is designated as the original allele 
whereas other alleles are designated as alternative alleles. The two alleles of a biallelic marker can 
also be referred to as allele 1 and allele 2. Diploid organisms may be homozygous or heterozygous 
for an allelic form. 

The term " heterozygosity rate " is used herein to refer to the incidence of individuals in a 
25 population which are heterozygous at a particular allele. In a biallelic system, the heterozygosity 
rate is on average equal to 2P a (l-P a ), where P a is the frequency of the least common allele. In order 
to be useful in genetic studies, a genetic marker should have an adequate level of heterozygosity to 
allow a reasonable probability that a randomly selected person will be heterozygous. 

The term " genotype " as used herein refers the identity of the alleles present in an 
30 individual or a sample. In the context of the present invention, a genotype preferably refers to the 
description of the biallelic marker alleles present in an individual or a sample. The term 
"genotyping" a sample or an individual for a biallelic marker consists of determining the specific 
allele or the specific nucleotide carried by an individual at a biallelic marker. 

The term " polymorphism " as used herein refers to the occurrence of two or more 
35 alternative genomic sequences or alleles between or among different genomes or individuals. 

"Polymorphic" refers to the condition in which two or more variants of a specific genomic sequence 
can be found in a population. A "polymorphic site" is the locus at which the variation occurs. A 
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single nucleotide polymorphism is the replacement of one nucleotide by another nucleotide at the 
polymorphic site. Deletion of a single nucleotide or insertion of a single nucleotide also gives rise to 
single nucleotide polymorphisms. In the context of the present invention, "single nucleotide 
polymorphism" preferably refers to a single nucleotide substitution. 
5 The term " biallelic polymorphism " and " biallelic marker " are used interchangeably herein 

to refer to a single nucleotide polymorphism having two alleles at a fairly high frequency in the 
population. A "biallelic marker allele" refers to the nucleotide variants present at a biallelic marker 
site. Typically, the frequency of the less common allele of the biallelic markers of the present 
invention has been validated to be greater than 1%, preferably the frequency is greater than 10%, 
1 0 more preferably the frequency is at least 20% (i.e. heterozygosity rate of at least 0.32), even more 
preferably the frequency is at least 30% (i.e. heterozygosity rate of at least 0.42). A biallelic marker 
wherein the frequency of the less common allele is 30% or more is termed a "high quality biallelic 
marker". 

The location of nucleotides in a polynucleotide with respect to the center of the 

1 5 polynucleotide are described herein in the following manner. When a polynucleotide has an odd 
number of nucleotides, the nucleotide at an equal distance from the 3' and 5' ends of the 
polynucleotide is considered to be " at the center " of the polynucleotide, and any nucleotide 
immediately adjacent to the nucleotide at the center, or the nucleotide at the center itself is 
considered to be "within 1 nucleotide of the center." With an odd number of nucleotides in a 

20 polynucleotide any of the five nucleotides positions in the middle of the polynucleotide would be 
considered to be within 2 nucleotides of the center, and so on. When a polynucleotide has an even 
number of nucleotides, there would be a bond and not a nucleotide at the center of the 
polynucleotide. Thus, either of the two central nucleotides would be considered to be "within 1 
nucleotide of the center" and any of the four nucleotides in the middle of the polynucleotide would 

25 be considered to be "within 2 nucleotides of the center", and so on. 

As used herein the term " BAP28-related biallelic marker " relates to a set of biallelic 
markers in linkage disequilibrium with the BAP28 gene or a BAP28 nucleotide sequence. The term 
"BAP28-related biallelic marker" relates to the biallelic markers located in a sequence selected from 
the group consisting of SEQ ID Nos 1-4, and 18-31, a fragment thereof and/or the complementary 

30 sequence thereto. The term BAP28-related biallelic marker encompasses the biallelic markers Al to 
A58 disclosed in Table 2 and any biallelic markers in linkage disequilibrium therewith. 

The terms " complementary " or "complement thereof are used herein to refer to the 
sequences of polynucleotides which is capable of forming Watson & Crick base pairing with another 
specified polynucleotide throughout the entirety of the complementary region. For the purpose of the 

35 present invention, a first polynucleotide is deemed to be complementary to a second polynucleotide 
when each base in the first polynucleotide is paired with its complementary base. Complementary 
bases are, generally, A and T (or A and U), or C and G. "Complement" is used herein as a synonym 
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from "complementary polynucleotide", "complementary nucleic acid" and "complementary 
nucleotide sequence". These terms are applied to pairs of polynucleotides based solely upon their 
sequences and not any particular set of conditions under which the two polynucleotides would 
actually bind. 

5 Variants and Fragments 

1- Polynucleotides 

The invention also relates to variants and fragments of the polynucleotides described 
herein, particularly of a BAP28 gene containing one or more biallelic markers according to the 
invention. 

1 0 Variants of polynucleotides, as the term is used herein, are polynucleotides that differ from 

a reference polynucleotide. A variant of a polynucleotide may be a naturally occurring variant such 
as a naturally occurring allelic variant, or it may be a variant that is not known to occur naturally. 
Such non-naturally occurring variants of the polynucleotide may be made by mutagenesis 
techniques, including those applied to polynucleotides, cells or organisms. Generally, differences 
1 5 are limited so that the nucleotide sequences of the reference and the variant are closely similar 
overall and, in many regions, identical. 

Variants of polynucleotides according to the invention include, without being limited to, 
nucleotide sequences which are at least 95% identical to a polynucleotide selected from the group 
consisting of the nucleotide sequences of SEQ ID Nos 1-4, and 9-13 or to any polynucleotide 
20 fragment of at least 12, 15, 18, 20, 25, 30, 50, 80, 100, 150, 200, 250, 300, 350, 400, 450, 500, 600 
or 1000 consecutive nucleotides of a polynucleotide selected from the group consisting of the 
nucleotide sequences of SEQ ID Nos 1-4 and 9-13, and preferably at least 99% identical, more 
particularly at least 99.5% identical, and most preferably at least 99.8% identical to a polynucleotide 
selected from the group consisting of the nucleotide sequences of SEQ ID Nos 1-4 and 9-13, or to 
25 any polynucleotide fragment of at least 12, 15, 18, 20, 25, 30, 50, 80, 100, 150, 200, 250, 300, 350, 
400, 450, 500, 600 or 1000 consecutive nucleotides of a polynucleotide selected from the group 
consisting of the nucleotide sequences of SEQ ID No 1-4 and 9-13. 

Nucleotide changes present in a variant polynucleotide may be silent, which means that 
they do not alter the amino acids encoded by the polynucleotide. However, nucleotide changes may 
30 also result in amino acid substitutions, additions, deletions, fusions and truncations in the 

polypeptide encoded by the reference sequence. The substitutions, deletions or additions may 
involve one or more nucleotides. The variants may be altered in coding or non-coding regions or 
both. Alterations in the coding regions may produce conservative or non-conservative amino acid 
substitutions, deletions or additions. 
35 in the context of the present invention, particularly preferred embodiments are those in 

which the polynucleotides encode polypeptides which retain substantially the same biological 
function or activity as the mature BAP28 protein, or those in which the polynucleotides encode 
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polypeptides which maintain or increase a particular biological activity, while reducing a second 
biological activity 

A polynucleotide fragment is a polynucleotide having a sequence that is entirely the same 
as part but not all of a given nucleotide sequence, preferably the nucleotide sequence of a BAP28 
5 gene, and variants thereof. The fragment can be a portion of an intron or an exon of a BAP28 gene. 
It can also be a portion of the regulatory regions of BAP28. In some embodiments, the fragments 
may comprise at least one polymorphism or biallelic marker of the invention. 

Such fragments may be "free-standing", i.e. not part of or fused to other polynucleotides, 
or they may be comprised within a single larger polynucleotide of which they form a part or region. 
10 Indeed, several of these fragments may be present within a single larger polynucleotide. 

In some embodiments, such fragments may comprise, consist of, or consist essentially of a 
contiguous span of at least 8, 10, 12, 15, 18, 20, 25, 35, 40, 50, 70, 80, 100, 250, 500 or 1000 
nucleotides in length. 

2- Polypeptides 

15 The invention also relates to variants, fragments, analogs and derivatives of the 

polypeptides described herein, including mutated BAP28 proteins. 

The variant may be 1) one in which one or more of the amino acid residues are substituted 
with a conserved or non-conserved amino acid residue and such substituted amino acid residue may 
or may not be one encoded by the genetic code, or 2) one in which one or more of the amino acid 

20 residues includes a substituent group, or 3) one in which the mutated BAP28 is fused with another 
compound, such as a compound to increase the half-life of the polypeptide (for example, 
polyethylene glycol), or 4) one in which the additional amino acids are fused to the mutated BAP28, 
such as a leader or secretory sequence or a sequence which is employed for purification of the 
mutated BAP28 or a preprotein sequence. Such variants are deemed to be within the scope of those 

25 skilled in the art. 

A polypeptide fragment is a polypeptide having a sequence that entirely is the same as part 
but not all of a given polypeptide sequence, preferably a polypeptide encoded by a BAP28 gene and 
variants thereof. 

In the case of an amino acid substitution in the amino acid sequence of a polypeptide 
30 according to the invention, one or several amino acids can be replaced by "equivalent" amino acids. 
The expression "equivalent" amino acid is used herein to designate any amino acid that may be 
substituted for one of the amino acids having similar properties, such that one skilled in the art of 
peptide chemistry would expect the secondary structure and hydropathic nature of the polypeptide to 
be substantially unchanged. Generally, the following groups of amino acids represent equivalent 
35 changes: (1) Ala, Pro, Gly, Glu, Asp, Gin, Asn, Ser, Thr; (2) Cys, Ser, Tyr, Thr; (3) Val, He, Leu, 
Met, Ala, Phe; (4) Lys, Arg, His; (5) Phe, Tyr, Trp, His. 
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A specific embodiment of a modified BAP28 peptide molecule of interest according to the 
present invention, includes, but is not limited to, a peptide molecule which is resistant to proteolysis, 
is a peptide in which the -CONH- peptide bond is modified and replaced by a (CH2NH) reduced 
bond, a (NHCO) retro inverso bond, a (CH2-0) methylene-oxy bond, a (CH2-S) thiomethylene 
5 bond, a (CH2CH2) carba bond, a (CO-CH2) cetomethylene bond, a (CHOH-CH2) hydroxyethylene 
bond), a (N-N) bound, a E-alcene bond or also a -CH=CH- bond. The invention also encompasses a 
human BAP28 polypeptide or a fragment or a variant thereof in which at least one peptide bond has 
been modified as described above. 

Such fragments may be "free-standing", i.e. not part of or fused to other polypeptides, or 
1 0 they may be comprised within a single larger polypeptide of which they form a part or region. 
However, several fragments may be comprised within a single larger polypeptide. 

As representative examples of polypeptide fragments of the invention, there may be 
mentioned those which have at least 6 amino acids, preferably at least 8 to 10 amino acids, more 
preferably at least 12, 15, 20, 25, 30, 40, 50, 100 or 200 amino acids long. A specific embodiment of 
1 5 a BAP28 fragment is a fragment containing at least one amino acid mutation in the BAP28 protein. 

Identity Between Nucleic Acids Or Polypeptides 
The terms "percentage of sequence identity" and "percentage homology" are used 
interchangeably herein to refer to comparisons among polynucleotides and polypeptides, and are 
determined by comparing two optimally aligned sequences over a comparison window, wherein the 
20 portion of the polynucleotide or polypeptide sequence in the comparison window may comprise 
additions or deletions (i.e., gaps) as compared to the reference sequence (which does not comprise 
additions or deletions) for optimal alignment of the two sequences. The percentage is calculated by 
determining the number of positions at which the identical nucleic acid base or amino acid residue 
occurs in both sequences to yield the number of matched positions, dividing the number of matched 
25 positions by the total number of positions in the window of comparison and multiplying the result by 
100 to yield the percentage of sequence identity. Homology is evaluated using any of the variety of 
sequence comparison algorithms and programs known in the art. Such algorithms and programs 
include, but are by no means limited to, TBLASTN, BLASTP, FASTA, TFASTA, and CLUSTALW 
(Pearson and Lipman, 1988; Altschul et al., 1990; Thompson et al., 1994; Higgins et al., 1996; 
30 Altschul et al., 1990; Altschul et al., 1993). In a particularly preferred embodiment, protein and 
nucleic acid sequence homologies are evaluated using the Basic Local Alignment Search Tool 
("BLAST") which is well known in the art (see, e.g., Karlin and Altschul, 1990; Altschul et al., 
1990, 1993, 1997). In particular, five specific BLAST programs are used to perform the following 
task: (1) BLASTP and BLAST3 compare an amino acid query sequence against a protein sequence 
35 database; (2) BLASTN compares a nucleotide query sequence against a nucleotide sequence 
database; (3) BLASTX compares the six-frame conceptual translation products of a query 
nucleotide sequence (both strands) against a protein sequence database; (4) TBLASTN compares a 
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query protein sequence against a nucleotide sequence database translated in all six reading frames 
(both strands); and, (5) TBLASTX compares the six-frame translations of a nucleotide query 
sequence against the six-frame translations of a nucleotide sequence database. 

The BLAST programs identify homologous sequences by identifying similar segments, 
5 which are referred to herein as "high-scoring segment pairs," between a query amino or nucleic acid 
sequence and a test sequence which is preferably obtained from a protein or nucleic acid sequence 
database. High-scoring segment pairs are preferably identified (i.e., aligned) by means of a scoring 
matrix, many of which are known in the art. Preferably, the scoring matrix used is the BLOSUM62 
matrix (Gonnet et al., 1992; Henikoff and Henikoff, 1993). Less preferably, the PAM or PAM250 
10 matrices may also be used (see, e.g., Schwartz and Dayhoff, eds., 1978). The BLAST programs 
evaluate the statistical significance of all high-scoring segment pairs identified, and preferably 
selects those segments which satisfy a user-specified threshold of significance, such as a user- 
specified percent homology. Preferably, the statistical significance of a high-scoring segment pair is 
evaluated using the statistical significance formula of Karlin (see, e.g., Karlin and Altschul, 1990). 
1 5 Stringent Hybridization Conditions 

For the purpose of defining such a hybridizing nucleic acid according to the invention, the 
stringent hybridization conditions are the followings : 

the hybridization step is realized at 65°C in the presence of 6 x SSC buffer, 5 x Denhardt's 
solution, 0,5% SDS and 100|ig/ml of salmon sperm DNA. 
20 The hybridization step is followed by four washing steps : 

- two washings during 5 min, preferably at 65°C in a 2 x SSC and 0.1%SDS buffer; 

- one washing during 30 min, preferably at 65°C in a 2 x SSC and 0.1% SDS buffer, 

- one washing during 10 min, preferably at 65°C in a 0.1 x SSC and 0.1%SDS buffer, 
these hybridization conditions being suitable for a nucleic acid molecule of about 20 

25 nucleotides in length. There is no need to say that the hybridization conditions described above are 
to be adapted according to the length of the desired nucleic acid, following techniques well known to 
the one skilled in the art. The suitable hybridization conditions may for example be adapted 
according to the teachings disclosed in the book of Hames and Higgins (1985). 
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Exon 


Position in SEQ ID No 1 


Intron 


Position in SEQ ID No 1 


Begining 


End 


Begining 


End 


1 


4997 


5076 


1-2 


5077 


5370 


2 


5371 


5544 


2-3 


5545 


6120 


3 


6121 


6337 


3-4 


6338 


9876 


4 


9877 


10018 


4-5 


10019 


11521 


5 


11522 


11623 


5-6 


11624 


12520 


6 


12521 


12661 


6-7 


12662 


13452 


7 


13453 


13664 


7-8 


13665 


13823 


8 


13824 


13957 


8-9 


13958 


15375 
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9 


15376 


15478 


9-10 


15479 


16854 


10 


16855 


16965 


10-11 


16966 


17377 


11 


17378 


17495 


11-12 


17496 


18534 


12 


18535 


18642 


12-13 


18643 


21445 


13 


21446 


21541 


13-14 


21542 


21998 


14 


21999 


22087 


14-15 


22088 


23035 


15 


23036 


23247 


15-16 


23248 


23545 


16 


23546 


23667 


16-17 


23668 


24269 


17 


24270 


24461 


17-18 


24462 


26286 


18 


26287 


26470 


18-19 


26471 


26610 


19 


26611 


26747 


19-20 


26748 


28067 


20 


28068 


28260 


20-21 


28261 


32539 


21 


32540 


32709 


21-22 


32710 


33111 


22 


33112 


33270 


22-23 


33271 


34585 


23 


34586 


34828 


23-24 


34829 


35155 


24 


35156 


35287 


24-25 


35288 


36659 


25 


36660 


36763 


25-26 


36764 


36933 


26 


36934 


37077 


26-27 


37078 


37802 


27 


37803 


37921 


27-28 


37922 


38016 


28 


38017 


38138 


28-29 


38139 


40364 


29 


40365 


40493 


29-30 


40494 


42617 


30 


42618 


42848 


30-31 


42849 


43451 


31 


43452 


43578 


31-32 


43579 


44835 


32 


44836 


44999 


32-33 


45000 


48222 


33 


48223 


48269 


33-34 


48270 


49655 


34 


49656 


49779 


34-35 


49780 


50357 


35 


50358 


50498 


35-36 


50499 


50963 


36 


50964 


51256 


36-37 


51257 


52147 


37 


52148 


52298 


37-38 


52299 


53234 


38 


53235 


53393 


38-39 


53394 


53553 


39 


53554 


53688 


39-40 


53689 


53837 


40 


53838 


53942 


40-41 


53943 


54028 


41 


54029 


54197 


41-42 


54198 


54740 


42 


54741 


54895 


42-43 


54896 


55753 


43 


55754 


55912 


43-44 


55913 


57385 


44 


57386 


57494 


44-45 


57495 


58503 


45 


58504 


58827 


45-B' 


58828 


85946 


45b 


58504 


59354 


45b-B' 


59355 


85946 


B' 


85947 


86168 


B'-A' 


86169 


91228 


A' 


91229 


91851 









Genomic Sequences Of The Human BAP28 Gene 

The present invention concerns the genomic sequence of BAP28 comprising the sequence 
of SEQ ID No 1 . The present invention encompasses BAP28 gene, or BAP28 genomic sequence 
consisting of, consisting essentially of, or comprising a sequence selected from the group consisting 
of SEQ ID No 1, a sequence complementary thereto, as well as fragments and variants thereof. 
These polynucleotides may be purified, isolated, or recombinant. 

BAP28 was localized by the present inventors to the chromosome lq43 region. 
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The human BAP28 genomic nucleic acid comprises at least 47 exons. The exon positions 
in SEQ ID No 1 are detailed below in the Table A. 

The exons B' and A' of the Bap28 gene have been found through the study of the PCTA-1 
gene which is described in the PCT application WO 99/64590, incorporated herein by reference. 
5 One public cDNA (Genbank Accession Number AF07400 1) shows an additional 5' exon in 

comparison of the cDNA described in the above-referenced application. This exon has been called 
exon B. It does not seem to comprise a splice site in 5'. So this exon will be a first exon. Long range 
PCR experiments with a first couple of primers PCTAexBLF33/PCTA5Ra230 (SEQ ID No 40/SEQ 
ID No 45) and a second one PCTAexBLF120n/PCTA5Ra220n (SEQ ID No 41/SEQ ID No 44) 
1 0 confirm the existence of a cDNA comprising at least the exon B and the exons 0, 1 , and 2 (SEQ ID 
No 10). 

Three additional exons have been also identified, namely exons A, C and D. Exon C is the 
most upstream exon. Exons A and D have a 5' splice site. Long range PCR with a first couple of 
primers PCTAexALF12/ PCTAex9terLR330 (SEQ ID No 36/SEQ ID No 53) and a second one 

15 PCTAexALF 1 3 n/ PCTAex9terLR325n (SEQ ID No 37/SEQ ID No 54) showed an alternative 
PCTA-1 cDNA consisting with the exons A, 0, 1, 2, 3, 9bis and 9ter (SEQ ID No 13). Other 
alternative PCTA- 1 cDNAs comprise consecutively the exons A, D, 0, 1 , and 2 (SEQ ID No 1 2), the 
exons A, 1 and 2 (SEQ ID No 1 1), or the exons C and A (SEQ ID No 9). The form ADO 12 and A12 
have been amplified with the first couple of primers PCTAexALF 12 / PCTA5Ra230 (SEQ ID No 

20 36/SEQ ID No 45) and the second one PCTAexALF 1 3n /PCTA5Ra220n (SEQ ID No 37/SEQ ID 
No 44). The exon C have been identified by a RACE experiment with PCTAexALR60 primer (SEQ 
ID No 38) from the exon A. The figure 2 shows the alternative cDNAs of PCTA-1 and the 
alternative 5' ends of PCTA-1 cDNAs. 

The first identified BAP28 cDNAs comprise either the exons 1 to 45 or 1 to 44 and 45b. 

25 They are detailed in the section "BAP28 cDNA sequences". The exon 45 of the BAP28 cDNA 
comprises a polyadenylation site and some RACE experiments failed not show any additional 
sequence downstream of the exon 45, which was the last identified exon. 

The study of the PCTA- 1 new exons for an alternative cDNA comprising both the exons A 
and B provides two additional BAP28 exons, the exons A' and B'. Indeed, two upstream PCR 

30 primers were designed; one in the exon A (PCTAexALF 12 (SEQ ID No 36 following by 
PCTAexALF 13n (SEQ ID No 37)) and the other in exon B (PCTAexBLF33 (SEQ ID No 40) 
following by PCTAexBLF120n (SEQ ID No 41)). The downstream primer was generated in 
previously identified PCTA-1 exons (PCTA5Ra230 (SEQ ID No 45) following by PCTA5Ra220n 
(SEQ ID No 44)). No alternative cDNA comprising both exons has been observed. Therefore, two 

35 couples of primers was designed with the upstream primer in exon A and the downstream primer in 
exon B. More particularly, the amplification was done with a first couple of primers 
PCTAexALF 12/ PCTAexBLR140 (SEQ ID No 36/SEQ ID No 42) and a second one 
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PCTAexALF13n/ PCTAexBLR40n (SEQ ID No 37/SEQ ID No 43). An amplification product was 
obtained. However, the exons were slightly moved and the splice sites were only available on the 
opposite strand. Therefore, the amplification product was not from the PCTA-1 gene but rather than 
was supposed to be from the BAP28 gene which is on the opposite strand. This amplification 
5 product contains the exons A' and B' (SEQ ID No 4). In order to check that the amplification 
product comes from BAP28, a PCR amplification was proceeded with a dowstream primer in the 
exon A and an upstream primer in exon 43 of BAP28 gene. More particularly, the PCR was done 
with a first couple of primers PCTAexALF12/ BAP283Ra6283 (SEQ ID No 36/SEQ ID No 32) and 
a second one PCTAexALF13n/ BAP283Ra6324n (SEQ ID No 37/SEQ ID No 33) The amplification 

1 0 product confirmed that the slightly moved exons A and B are part of the BAP28 cDNA. The 

sequencing of the amplification product showed a cDNA comprising the exons 44, 45b, and A. The 
BAP28 cDNA with the exons B' and A' likely consists to an other alternative cDNA form. 

Thus, the invention embodies purified, isolated, or recombinant polynucleotides 
comprising a nucleotide sequence selected from the group consisting of the exons of the BAP28 

15 gene, or a sequence complementary thereto. Preferred are nucleotide sequences selected from the 
group consisting of the exons of the BAP28 gene having the nucleotide position ranges listed in 
Table A, or a complementary sequence thereto or a fragment or a variant thereof. 

Encompassed by the invention are purified, isolated, or recombinant nucleic acids 
comprising a combination of at least two exons of the BAP28 gene, wherein the polynucleotides are 

20 arranged within the nucleic acid, from the 5'-end to the 3 '-end of said nucleic acid, in the same order 
as in SEQ ID No 1 . The invention further deals with purified, isolated, or recombinant nucleic acids 
comprising a combination of at least two exons of the BAP28 gene, wherein the nucleic acids 
comprise at least one exon selected from the group consisting of exons 1 to 45, 45b, B' and A', 
wherein the polynucleotides are arranged within the nucleic acid, from the 5'-end to the 3 '-end of 

25 said nucleic acid, in the same order as in SEQ ID No 1 . 

Preferred polynucleotides of the invention embody purified, isolated, or recombinant 
polynucleotides comprising a contiguous span of at least 12, 15, 18, 20, 25, 30, 50, 80, 100, 150, or 
200 nucleotides, to the extent that such a length is consistent with the lengths of the particular 
nucleotide position, of SEQ ID No 1 or the complement thereof, wherein said contiguous span 

30 comprises at least 1, 2, 3, 5, 1 0, 20, 30, 40 or 50 nucleotides selected from the group consisting of 
the following nucleotide positions of SEQ ID No 1: 4997-5076, 5371-5544, 6121-6337, 9877- 
10018, 11522-11623, 12521-12661, 13453-13664, 13824-13957, 15376-15478, 16855-16965, 
17378-17495, 18535-18642, 21446-21541, 21999-22087, 23036-23247, 23546-23667, 24270- 
24461, 26287-26470, 26611-26747, 28068-28260, 32540-32709, 33112-33270, 34586-34828, 

35 35156-35287, 36660-36763, 36934-37077, 37803-37921, 38017-38138, 40365-40493, 42618- 
42848, 43452-43578, 44836-44999, 48223-48269, and 49656-49779. 
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The position of the introns is detailed in Table A. Thus, the invention embodies purified, 
isolated, or recombinant polynucleotides comprising a nucleotide sequence selected from the group 
consisting of the introns of the BAP28 gene, or a sequence complementary thereto. 

The invention also encompasses a purified, isolated, or recombinant polynucleotides 
5 comprising a nucleotide sequence having at least 70, 75, 80, 85, 90, or 95% nucleotide identity with 
a nucleotide sequence of SEQ ID No 1 or a complementary sequence thereto or a fragment thereof. 
The nucleotide differences as regards to the nucleotide sequences of SEQ ID No 1 may be generally 
randomly distributed throughout the entire nucleic acid. Nevertheless, preferred nucleic acids are 
those wherein the nucleotide differences as regards to the nucleotide sequences of SEQ ID No 1 are 

10 predominantly located outside the coding sequences contained in the exons. These nucleic acids, as 
well as their fragments and variants, may be used as oligonucleotide primers or probes in order to 
detect the presence of a copy of the BAP28 gene in a test sample, or alternatively in order to amplify 
a target nucleotide sequence within the BAP28 sequences. 

Another object of the invention consists of a purified, isolated, or recombinant nucleic 

15 acids that hybridizes with a nucleotide sequence selected from the group consisting of SEQ ID No 1 
or a complementary sequence thereto or a variant thereof, under the stringent hybridization 
conditions as defined above. 

Particularly preferred nucleic acids of the invention include isolated, purified, or 
recombinant polynucleotides comprising a contiguous span of at least 12, 15, 18, 20, 25, 30, 50, 80, 

20 100, 150, 200, 250, 300, 350, 400, 450, 500, 600 or 1000 nucleotides, to the extent that such a length 
is consistent with the lengths of the particular nucleotide position, of SEQ ID No 1 or the 
complement thereof, wherein said contiguous span comprises at least 1, 2, 3, 5, 10, 20, 30, 40 or 50 
of the following nucleotide positions of SEQ ID No 1: 1-50357, 50499-50963, 51257-52147, 52299- 
53234, 53394-53553, 53689-53837, 53943-54028, 54198-54740, 54896-55753, 55913-57385, 

25 57495-58503, 58828-85946, 59355-85946, 86169-91228, and/or 91852 to 97662. 

Further preferred nucleic acids of the invention include isolated, purified, or recombinant 
polynucleotides comprising a contiguous span of at least 12, 15, 18, 20, 25, 30, 50, 80, 100, 150, 
200, 250, 300, 350, 400, 450, 500, 600 or 1000 nucleotides, to the extent that such a length is 
consistent with the lengths of the particular nucleotide position, of SEQ ID No 1 or the complement 

30 thereof, wherein said contiguous span comprises at least 1, 2, 3, 5, 10, 20, 30, 40 or 50 of the 
following nucleotide positions of SEQ ID No 1: 1-2500, 2501-5000, 5001-7500, 7501-10000, 
10001-12500, 12501-15000, 15001-17500, 17501-20000, 20001-22500, 22501-25000, 25001- 
27500, 27501-30000, 30001-32500, 32501-35000, 35001-37500, 37501-40000, 40001-42500, 
42501-45000, 45001-47500, 47501-50000, 50001-50357, 50499-50963, 51257-52147, 52299- 

35 53234, 53394-53553, 53689-53837, 53943-54028, 54198-54740, 54896-55753, 55913-57385, 
57495-58503, 58828-85946, 59355-85946, 86169-91228, and/or 91852 to 97662. 
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Other preferred nucleic acids of the invention include isolated, purified, or recombinant 
polynucleotides comprising a contiguous span of at least 12, 15, 18, 20, 25, 30, 35, 40, 50, 60, 70, 
80, 90, 100, 1 50, 200, 500, or 1000 nucleotides, to the extent that such a length is consistent with the 
lengths of the particular nucleotide position, of SEQ ID No 1, or the complements thereof, wherein 
5 said contiguous span comprises at least one BAP28-re\ated biallelic marker selected from the group 
consisting of A 1 to A58, preferably Al to A27, A34, A3 7 to A41, A43 to A49, A52, and A54 to 
A58, more preferably at least one of the biallelic markers Al, A4, 16, A30, A31, A42, A50, A51, 
and A53. 

It should be noted that nucleic acid fragments of any size and sequence may also be 
10 comprised by the polynucleotides described in this section. 

In another aspect, the invention concerns polymorphisms of BAP28. 
While this section is entitled "Genomic Sequences of BAP28," it should be noted that 
nucleic acid fragments of any size and sequence may also be comprised by the polynucleotides 
described in this section, flanking the genomic sequences of BAP28 on either side or between two or 
1 5 more such genomic sequences. 

BAP28 cDNA Sequences 

Another object of the invention is a purified, isolated, or recombinant nucleic acid 
comprising a nucleotide sequence selected from the group consisting of SEQ ID Nos 2 and 3, 
complementary sequences thereto, as well as allelic variants, and fragments thereof. Moreover, 

20 preferred polynucleotides of the invention include purified, isolated, or recombinant BAP28 cDNAs 
consisting of, consisting essentially of, or comprising a nucleotide sequence selected from the group 
consisting of SEQ ID Nos 2 and 3. The two BAP 2 8 cDNAs have to a different 3' end. The first one, 
namely the cDNA of the SEQ ID No 2, comprises the exons 1 to 44 and 45. The second one, namely 
the cDNA of the SEQ ID No 3, comprises the exons 1 to 44, 45b and A'. The cDNA of SEQ ID No 

25 2 or 3 are described in Table B. 

Consequently, the invention concerns a purified, isolated, and recombinant nucleic acids 
comprising a nucleotide sequence of the 5'UTR of the BAP28 cDNA, a sequence complementary 
thereto, or an allelic variant thereof. The invention also concerns a purified, isolated, and 
recombinant nucleic acids comprising a nucleotide sequence of the 3'UTR of the BAP28 cDNA, a 

30 sequence complementary thereto, or an allelic variant thereof. 



Table B 



cDNA 


Position range of 
5UTR 


Position range of 
ORF 


Position range of 
3UTR 


cDNAl 


1 


112 


113 


6547 


6548 


6782 


cDNA2 


1 


112 


113 


6547 


6548 


7932 
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As described in the section "Genomic Sequences of the human Bap28 gene", an alternative 
form of the BAP28 cDNA comprises the exons B' and A'. Therefore, the invention concerns a 
cDNA of BAP28 comprising the nucleotide sequence of SEQ ID No 4. 

Particularly preferred embodiments of the invention include isolated, purified, or 
5 recombinant polynucleotides comprising a contiguous span of at least 12, 15, 18, 20, 25, 30, 50, 80, 
100, 150, 200, 250, 300, 350, 400, 450, 500, 600 or 1000 nucleotides of a nucleic acid sequence 
selected from the group consisting of SEQ ID Nos 2 and 3 or the complements thereof, wherein said 
contiguous span comprises at least 1, 2, 3, 5, or 10 of nucleotide positions 1 to 4995 of SEQ ID No 2 
or 3. Further preferred polynucleotides include isolated, purified, or recombinant polynucleotides 
10 comprising a contiguous span of at least 12, 15, 18, 20, 25, 30, 50, 80, 100, 150, 200, 250, 300, 350, 
400, 450, 500, 600 or 1000 nucleotides of a nucleic acid sequence selected from the group consisting 
of SEQ ID Nos 2 and 3 or the complements thereof, wherein said contiguous span comprises at least 
1, 2, 3, 5, or 10 of the following nucleotide positions of SEQ ID No 2 or 3: 1 to 2033, 2160 to 2348, 
and 2676 to 4995. Additional preferred nucleic acids of the invention include isolated, purified, or 
15 recombinant polynucleotides comprising a contiguous span of at least 12, 15, 18, 20, 25, 30, 50, 80, 
100, 150, 200, 250, 300, 350, 400, 450, 500, 600 or 1000 nucleotides of SEQ ID No 2, or the 
complements thereof, wherein said contiguous span comprises at least 1, 2, 3, 5, or 10 nucleotide 
positions of any one of the following ranges of nucleotide positions of SEQ ID No 2: 1 to 500, 501 
to 1000, 1001 to 1500, 1501 to 2000, 2001 to 2500, 2501 to 3000, 3001 to 3500, 3501 to 4000, 4001 
20 to 4500, 4501 to 4995, 5000 to 5500, 5501 to 6000, 6001 to 6500, and 6501 to 6782. Additional 
preferred nucleic acids of the invention include isolated, purified, or recombinant polynucleotides 
comprising a contiguous span of at least 12, 15, 18, 20, 25, 30, 50, 80, 100, 150, 200, 250, 300, 350, 
400, 450, 500, 600 or 1000 nucleotides of SEQ ID No 3, or the complements thereof, wherein said 
contiguous span comprises at least 1, 2, 3, 5, or 10 nucleotide positions of any one of the following 
25 ranges of nucleotide positions of SEQ ID No 3: 1 to 500, 501 to 1000, 1001 to 1500, 1501 to 2000, 
2001 to 2500, 2501 to 3000, 3001 to 3500, 3501 to 4000, 4001 to 4500, 4501 to 4995, 5000 to 5500, 
5501 to 6000, 6001 to 6500, 6501 to 7000, 7001 to 7500, 7501 to 7932. 

The invention also pertains to a purified or isolated nucleic acid having at least 95% of 
nucleotide identity with a nucleotide sequence selected from the group consisting of SEQ ID Nos 2 
30 and 3 or a fragment thereof or a complementary sequence thereto, advantageously 99 %, preferably 
99.5% nucleotide identity and most preferably 99.8% nucleotide identity with a nucleotide sequence 
selected from the group consisting of SEQ ID Nos 2 and 3 or a fragment thereof or a complementary 
sequence thereto. 

Another object of the invention consists of a purified, isolated, or recombinant nucleic 
35 acids that hybridizes with a nucleotide sequence selected from the group consisting of SEQ ID Nos 2 
and 3 or a complementary sequence thereto or a variant thereof, under the stringent hybridization 
conditions as defined above. 
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The invention concerns a PCTA-1 cDNA comprising an exon selected from the group 
consisting of exons A, B, C, and D. More particularly, the invention concerns a PCTA-1 cDNA 
comprising a polynucleotide sequence selected from the group consisting of SEQ ID Nos 9-13 or a 
fragment thereof or a complementary sequence thereto. 
5 Encompassed by the invention are purified, isolated, or recombinant nucleic acids 

comprising a combination of at least two exons of the PCTA-1 gene, wherein the polynucleotides are 
arranged within the nucleic acid, from the 5'-end to the 3 '-end of said nucleic acid, in the same order 
as in SEQ ID No 1. The invention further deals with purified, isolated, or recombinant nucleic acids 
comprising a combination of at least two exons of the PCTA-1 gene, wherein the nucleic acids 

10 comprise at least one exon selected from the group consisting of exons C, A, D, B, 0, 1, 2, 3, 4, 5, 6, 
6bis, 7, 8, 9, 9bis and 9ter, wherein the polynucleotides are arranged within the nucleic acid, from 
the 5 '-end to the 3 '-end of said nucleic acid, in the same order as in SEQ ID No 1 . 

While this section is entitled "BAP28 cDNA Sequences," it should be noted that nucleic 
acid fragments of any size and sequence may also be comprised by the polynucleotides described in 

1 5 this section, flanking the genomic sequences of BAP28 on either side or between two or more such 
genomic sequences. 

NATURAL ANTISENSE 

Over the last 10 years, an increasing number of natural antisense RNAs has been reported 
in eukaryotes. Natural antisense RNAs are endogenous transcripts that exhibit complementary 

20 sequences to other transcripts, named sense transcripts. Most antisense transcripts are issued from 
the same locus as sense transcripts. Transcribed from opposite strands of DNA, sense and antisense 
transcripts overlap each other at least partially, and display perfect complementarity. The reported 
antisense RNAs are complementary to sense transcripts encoding proteins involved in extremely 
diverse biological functions : hormonal response, control of proliferation, development, structure, 

25 etc... 

In some cases, apart from their capability of encoding proteins per se, antisense RNAs 
were found to regulate, generally downregulate, the expression of their sense counterparts. Often 
changes in sense gene expression were correlated with the presence of antisense RNA. Indeed, an 
inverse relationship between levels of accumulation of sense and antisense messengers has been 

30 documented in several cases. Some examples have been reported in various pathology such as 
nervous disorders and cancer. 

These characteristics suggest that antisense transcripts are found throughout the whole 
eukaryotic world and might play a role in general antisense-mediated gene regulation as is the cases 
in prokaryotes. Indeed, antisense -mediated gene regulation is a way of decreasing the abundance of 

35 stable transcripts more rapidly than the cessation of transcription. In addition, natural antisense 



24 



GENSET.063AUS PATENT 
transcripts are thought to be involved not only in the normal regulation of gene expression but also 
in the alteration of gene regulation leading to different pathologies. 

Indeed, because of their complementarity, antisense transcripts may hybridize to sense 
transcripts and thus modify the expression of their sense counterparts at any step from transcription 
5 to translation. 

In the nucleus, antisense RNA may regulate sense expression either at the level of 
transcription, processing, or nucleocytoplasmic transport. Transcriptional regulation occurs either 
because the activity of sense and antisense promoters is differentially regulated by cellular 
conditions or because antisense transcription impedes sense transcription. This interference would 

10 involve the collision of two transcription complexes, resulting in premature termination or in 
reduced elongation of transcription, the transcripts with the highest rate of transcription being 
predominant. Antisense may also operate at a post-transcriptional level probably by impairing either 
maturation and/or transport of the sense transcript. 

Although some examples have shown that antisense regulation may occur in the nucleus, 

15 antisense regulation is generally described as a cytoplasmic event operating mostly at the messenger 
stability level. Furthermore, the regulation can also be made at the translation stage, particularly 
when interactions between sense and antisense occur in the 3 'UTR. 

Two mechanisms of antisense-mediated gene regulation may be envisioned. First, 
antisense transcripts displaying very similar structural features to sense transcripts may bind proteins 

20 actually interacting with their sense counterparts, thus depriving sense messengers from proteins 
necessary for their functions. The other mechanism of antisense-mediated regulation is thought to 
operate via duplex formation between complementary sense and antisense transcripts. By simple 
steric hindrance, RNA duplexes would prevent sense RNA from interacting with diverse cellular 
components required for normal sense expression, thus impairing maturation, nucleocytoplasmic 

25 transport, transcript stability, or translation depending on the cellular components involved. 

Alternatively, duplexes may represent substrates for double-stranded RNA specific enzymes. It is 
commonly believed that most duplexes will become targeted for degradation by RNAses and only 
the most abundant transcripts, either sense or antisense, will persist in the cells. More information on 
the natural antisense can be found in Vanhee-Brossollet et al. (1998). 

30 BAP28 and PCTA-1 are natural antisense 

BAP28 transcript has been identified as a natural antisense of the PCTA-1 transcript. 
Indeed, the coding sequence of PCTA-1 is on the opposite strand of the coding sequence of BAP28. 
Moreover, the 3 'UTR of BAP28 contains some sequences which are complementary of segments of 
the 5'UTR and 3'UTR of PCTA-1. More particularly, the exons A and B are common for the PCTA- 
35 1 and BAP28 genes, the exon 44 of BAP28 gene is antisense of the exons 9 and 9ter of PCTA-1, the 
exons 45 and 45b of BAP28 gene are antisense of the exon 9 of PCTA-1. Therefore, BAP28 
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transcript is the antisense of the PCTA-1 RNA. The Figure 1 presents the general organization of the 

BAP 28 and PCTA-1 genes. 

The PCTA-1 protein has been shown to be a specific antigen of prostate cancer cells (WO 

96/21671, incorporated herein by reference). Therefore, one can assume that its expression is closely 
5 linked to the development of cancer, particularly prostate cancer. 

ESTs from the PCTA-1 gene were found in a broad range of tissues. As the protein PCTA- 

1 is only present in the prostate cancer cells, a regulation of the PCTA-1 RNA will occur, maybe at 

the stage of the RNA transcription, splicing, stability and/or translation. 

The 5'UTR and 3'UTR regions of a gene are of particular importance in that they often 
10 comprise regulatory elements which can play a role in providing appropriate expression levels, 

particularly through the control of mRNA stability. 

As the BAP28 transcript is the natural antisense of the PCTA-1 mRNA, the BAP28 mRNA 

is likely to be involved in the regulation of the PCTA-1 expression and, by consequence, in the 

process of development of prostate cancer. 
15 The involvement of BAP28 gene in prostate cancer is reported through the clearly 

significant association of the iL4P2S-related biallelic markers to prostate cancer. Furthermore, the 

PCT application W098/12327, incorporated herein by reference, showed that BAP28 should be 

involved in interaction with BRCA1. Therefore, BAP28 may be a tumor suppressor. During the 

process of carcinogenesis, BAP28 would become inactive and its expression could decrease. This 
20 expression decrease of BAP28 would lead to an increase of the PCTA-1 mRNA stability and the 

presence of the PCTA-1 protein at the cell surface. We can hypothesize that these events correspond 

to a natural defense against the cancer cells. 

Consequently, the invention concerns the use of BAP28 nucleotide sequence from the 

mRNA as antisense in order to control the PCTA-1 expression and preferably to inhibit the PCTA-1 
25 expression. The invention also concerns the use of PCTA-1 nucleotide sequence from the mRNA as 

an antisense in order to control the BAP28 expression. These antisense can be used in order to avoid 

cancer development, preferably prostate cancer development. 

An embodiment of the invention concerns the polynucleotide segment common in the 

PCTA-1 and BAP28 cDNAs. More particularly, the invention concerns isolated, purified, or 
30 recombinant polynucleotides comprising a contiguous span of at least 12, 15, 18, 20, 25, 30, 50, 80, 

100, 150, 200, 250, 300, 350, 400, 450, 500, 600 or 1000 nucleotides of SEQ ID No 1, or the 

complements thereof, wherein said contiguous span comprises at least 1, 2, 3, 5, or 10 nucleotide 

positions of any one of the following ranges of nucleotide positions of SEQ ID No 1: 57386-27494, 

58504-59354, 85947-86108, and 91259-91325. 
35 An additional embodiment is the use of a polynucleotide according to the invention, more 

particularly polynucleotides comprising a contiguous span of at least 12, 15, 18, 20, 25, 30, 50, 80, 

100, 150, 200, 250, 300, 350, 400, 450, 500, 600 or 1000 nucleotides of SEQ ID No 1, or the 
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complements thereof, wherein said contiguous span comprises at least 1, 2, 3, 5, or 10 nucleotide 
positions of any one of the following ranges of nucleotide positions of SEQ ID No 1: 57386-27494, 
58504-59354, 85947-86108, and 91259-91325, for regulating the expression of PCTA-1 and/or 
BAP28. 

5 Coding Regions 

The BAP28 open reading frame is contained in the corresponding mRNAS of SEQ ID No 
2 or 3. More precisely, the effective BAP28 coding sequence (CDS) includes the region between 
nucleotide position 1 13 (first nucleotide of the ATG codon) and nucleotide position 6547 (end 
nucleotide of the TAA codon) of SEQ ID No 2 or 3. 

10 Thus, the present invention deals with a purified or isolated nucleic acid encoding a 

BAP28 protein or a fragment thereof. More particularly the present invention deals with a purified or 
isolated nucleic acid encoding a BAP28 protein having the amino acid sequence of SEQ ID No 5 or 
a peptide fragment or variant thereof. The present invention also embodies isolated, purified, and 
recombinant polynucleotides which encode a polypeptides comprising a contiguous span of at least 

15 6 amino acids, preferably at least 8 or 10 amino acids, more preferably at least 12, 15, 20, 25, 30, 40, 
50, or 100 amino acids of SEQ ID No 5, wherein said contiguous span includes at least 1, 2, 3, 5 or 
10 of the amino acid positions 1 to 1629 of the SEQ ID No 5. The present invention further 
embodies isolated, purified, and recombinant polynucleotides which encode a polypeptides 
comprising a contiguous span of at least 6 amino acids, preferably at least 8 or 10 amino acids, more 

20 preferably at least 12, 15, 20, 25, 30, 40, 50, or 100 amino acids of SEQ ID No 5, wherein said 
contiguous span contains an amino acid selected from the group consisting of an asparagine at the 
amino acid position 1694 of SEQ ID No 5, a valine at the amino acid position 1854 of SEQ ID No 5, 
an asparagine at the amino acid position 1967 of SEQ ID No 5, a glutamic acid at the amino acid 
position 2017 of SEQ ID No 5 and an alanine at the amino acid position 2050 of SEQ ID No 5. The 

25 present invention embodies isolated, purified, and recombinant polynucleotides which encode a 
polypeptides comprising a contiguous span of at least 6 amino acids, preferably at least 8 or 10 
amino acids, more preferably at least 12, 15, 20, 25, 30, 40, 50, or 100 amino acids of SEQ ID No 5, 
wherein said contiguous span includes at least 1, 2, 3, 5 or 10 of the amino acid positions 1 to 200, 
201 to 400, 401 to 600, 601 to 800, 801 to 1000, 1001 to 1200, 1201 to 1400 and/or 1401 to 1629 of 

30 the SEQ ID No 5. 

The above disclosed polynucleotide that contains the coding sequence of the BAP28 gene 
may be expressed in a desired host cell or a desired host organism, when this polynucleotide is 
placed under the control of suitable expression signals. The expression signals may be either the 
expression signals contained in the regulatory regions in the BAP28 gene of the invention or in 
35 contrast the signals may be exogenous regulatory nucleic sequences. Such a polynucleotide, when 
placed under the suitable expression signals, may also be inserted in a vector for its expression 
and/or amplification. 
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Regulatory Sequences Of BAP28 

As mentioned, the genomic sequence of the BAP28 gene contains regulatory sequences 
both in the non-coding 5 '-flanking region and in the non-coding 3 '-flanking region that border the 
BAP28 coding region containing the 45 exons of this gene. 

5 The 5 '-regulatory sequence of the BAP28 gene is localized between the nucleotide in 

position 2996 and the nucleotide in position 4996 of the nucleotide sequence of SEQ ID No 1 . The 
5 '-regulatory sequence contains the BAP28 promoter site. 

The genomic sequence of the BAP28 gene also contains regulatory sequences in the non- 
coding 3 '-flanking region that border the BAP28 coding region. The 3 '-regulatory sequence of the 

1 0 BAP28 gene is localized between nucleotide position 91852 and nucleotide position 97662 of SEQ 
ID No 1. 

Polynucleotides derived from the 5' and 3' regulatory regions are useful in order to detect 
the presence of at least a copy of a nucleotide sequence of SEQ ID No 1 or a fragment thereof in a 
test sample. 

1 5 The promoter activity of the 5 ' regulatory regions contained in BAP28 can be assessed as 

described below. 

In order to identify the relevant biologically active polynucleotide fragments or variants of 
SEQ ID No 1, the one skill in the art will refer to the book of Sambrook et al.(1989) which describes 
the use of a recombinant vector carrying a marker gene (i.e. beta galactosidase, chloramphenicol 

20 acetyl transferase, etc.) the expression of which will be detected when placed under the control of a 
biologically active polynucleotide fragments or variants of SEQ ID No 1 . Genomic sequences 
located upstream of the first exon of the BAP28 gene are cloned into a suitable promoter reporter 
vector, such as the pSEAP-Basic, pSEAP-Enhancer, p(3gal-Basic, pPgal-Enhancer, or pEGFP-1 
Promoter Reporter vectors available from Clontech, or pGL2-basic or pGL3-basic promoterless 

25 luciferase reporter gene vector from Promega. Briefly, each of these promoter reporter vectors 
include multiple cloning sites positioned upstream of a reporter gene encoding a readily assayable 
protein such as secreted alkaline phosphatase, luciferase, (3 galactosidase, or green fluorescent 
protein. The sequences upstream the BAP28 coding region are inserted into the cloning sites 
upstream of the reporter gene in both orientations and introduced into an appropriate host cell. The 

30 level of reporter protein is assayed and compared to the level obtained from a vector which lacks an 
insert in the cloning site. The presence of an elevated expression level in the vector containing the 
insert with respect to the control vector indicates the presence of a promoter in the insert. If 
necessary, the upstream sequences can be cloned into vectors which contain an enhancer for 
increasing transcription levels from weak promoter sequences. A significant level of expression 

35 above that observed with the vector lacking an insert indicates that a promoter sequence is present in 
the inserted upstream sequence. 
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Promoter sequence within the upstream genomic DNA may be further defined by 
constructing nested 5' and/or 3' deletions in the upstream DNA using conventional techniques such 
as Exonuclease III or appropriate restriction endonuclease digestion. The resulting deletion 
fragments can be inserted into the promoter reporter vector to determine whether the deletion has 
5 reduced or obliterated promoter activity, such as described, for example, by Coles et al.(1998). In 
this way, the boundaries of the promoters may be defined. If desired, potential individual regulatory 
sites within the promoter may be identified using site directed mutagenesis or linker scanning to 
obliterate potential transcription factor binding sites within the promoter individually or in 
combination. The effects of these mutations on transcription levels may be determined by inserting 
10 the mutations into cloning sites in promoter reporter vectors. This type of assay is well-known to 
those skilled in the art and is described in WO 97/17359, US 5,374,544; EP 582 796; US 5,698,389; 
US 5,643,746; US 5,502,176; and US 5,266,488; incorporated herein by reference. 

The strength and the specificity of the promoter of the BAP28 gene can be assessed 
through the expression levels of a detectable polynucleotide operably linked to the BAP28 promoter 
1 5 in different types of cells and tissues. The detectable polynucleotide may be either a polynucleotide 
that specifically hybridizes with a predefined oligonucleotide probe, or a polynucleotide encoding a 
detectable protein, including a BAP28 polypeptide or a fragment or a variant thereof. This type of 
assay is well-known to those skilled in the art and is described in US 5,502,176; and US 5,266,488; 
incorporated herein by reference. Some of the methods are discussed in more detail below. 
20 Polynucleotides carrying the regulatory elements located at the 5' end and at the 3' end of 

the BAP28 coding region may be advantageously used to control the transcriptional and translational 
activity of heterologous polynucleotide of interest. 

Thus, the present invention also concerns a purified or isolated nucleic acid comprising a 
polynucleotide which is selected from the group consisting of the 5' and 3' regulatory regions, or a 
25 sequence complementary thereto or a biologically active fragment or variant thereof. 

The invention also pertains to a purified or isolated nucleic acid comprising a 
polynucleotide having at least 95% nucleotide identity with a polynucleotide selected from the group 
consisting of the 5' and 3' regulatory regions, advantageously 99 % nucleotide identity, preferably 
99.5% nucleotide identity and most preferably 99.8% nucleotide identity with a polynucleotide 
30 selected from the group consisting of the 5' and 3 ' regulatory regions, or a sequence complementary 
thereto or a variant thereof or a biologically active fragment thereof. 

Another object of the invention consists of purified, isolated or recombinant nucleic acids 
comprising a polynucleotide that hybridizes, under the stringent hybridization conditions defined 
herein, with a polynucleotide selected from the group consisting of the nucleotide sequences of the 
35 5'- and 3' regulatory regions, or a sequence complementary thereto or a variant thereof or a 
biologically active fragment thereof. 
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Preferred fragments of either the 5' or 3' regulatory region have a length of about 1500 or 
1000 nucleotides, preferably of about 500 nucleotides, more preferably about 400 nucleotides, even 
more preferably 300 nucleotides and most preferably about 200 nucleotides. 

By "biologically active" polynucleotide derivatives of SEQ ID No 1 are polynucleotides 
5 comprising or alternatively consisting in a fragment of said polynucleotide which is functional as a 
regulatory region for expressing a recombinant polypeptide or a recombinant polynucleotide in a 
recombinant cell host. It could act either as an enhancer or as a repressor. 

For the purpose of the invention, a nucleic acid or polynucleotide is "functional" as a 
regulatory region for expressing a recombinant polypeptide or a recombinant polynucleotide if said 
1 0 regulatory polynucleotide contains nucleotide sequences which contain transcriptional and 
translational regulatory information, and such sequences are "operably linked" to nucleotide 
sequences which encode the desired polypeptide or the desired polynucleotide. 

The regulatory polynucleotides of the invention may be prepared from the nucleotide 
sequence of SEQ ID No 1 by cleavage using suitable restriction enzymes, as described for example 
1 5 in the book of Sambrook et al.( 1 989). The regulatory polynucleotides may also be prepared by 
digestion of SEQ ID No 1 by an exonuclease enzyme, such as Bal3 1 (Wabiko et al., 1986). These 
regulatory polynucleotides can also be prepared by nucleic acid chemical synthesis, as described 
elsewhere in the specification. 

The regulatory polynucleotides according to the invention may be part of a recombinant 
20 expression vector that may be used to express a coding sequence in a desired host cell or host 

organism. The recombinant expression vectors according to the invention are described elsewhere 
in the specification. 

A preferred 5 '-regulatory polynucleotide of the invention thus includes the 5'-UTR of the 

BAP 28 cDNA, or a biologically active fragment or variant thereof. 
25 A preferred 3 '-regulatory polynucleotide of the invention includes the 3'-UTR of the 

BAP28 cDNA, or a biologically active fragment or variant thereof. 

A further object of the invention consists of a purified or isolated nucleic acid comprising: 
a) a nucleic acid comprising a regulatory nucleotide sequence selected from the group 

consisting of: 

30 (i) a nucleotide sequence comprising a polynucleotide of the 5' regulatory region 

or a complementary sequence thereto; 

(ii) a nucleotide sequence comprising a polynucleotide having at least 95% of 

nucleotide identity with the nucleotide sequence of the 5' regulatory region or a 

complementary sequence thereto; 
35 (iii) a nucleotide sequence comprising a polynucleotide that hybridizes under 

stringent hybridization conditions with the nucleotide sequence of the 5' regulatory region or 

a complementary sequence thereto; and 
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(iv) a biologically active fragment or variant of the polynucleotides in (i), (ii) and 

(iii); 

b) a polynucleotide encoding a desired polypeptide or a nucleic acid of interest, operably 
linked to the nucleic acid defined in (a) above; 
5 c) In some embodiments, a nucleic acid comprising a 3'- regulatory polynucleotide, 

preferably a 3'- regulatory polynucleotide of the BAP28 gene. 

In a specific embodiment of the nucleic acid defined above, said nucleic acid includes the 
5'-UTR of the BAP28 cDNA, or a biologically active fragment or variant thereof. 

In a second specific embodiment of the nucleic acid defined above, said nucleic acid 
1 0 includes the 3 '-UTR of the BAP28 cDNA, or a biologically active fragment or variant thereof. 

The desired polypeptide encoded by the above-described nucleic acid may be of various 
nature or origin, encompassing proteins of prokaryotic or eukaryotic origin. Among the 
polypeptides expressed under the control of a BAP28 regulatory region include bacterial, fungal or 
viral antigens. Also encompassed are eukaryotic proteins such as intracellular proteins, like "house 
15 keeping" proteins, membrane-bound proteins, like receptors, and secreted proteins like endogenous 
mediators such as cytokines. The desired polypeptide may be the BAP28 protein, especially the 
protein of the amino acid sequence of SEQ ID No 1, or a fragment or a variant thereof. 

The desired nucleic acids encoded by the above-described polynucleotide, usually an RNA 
molecule, may be complementary to a desired coding polynucleotide, for example to the BAP28 
20 coding sequence, and thus useful as an antisense polynucleotide. 

Such a polynucleotide may be included in a recombinant expression vector in order to 
express the desired polypeptide or the desired nucleic acid in host cell or in a host organism. 
Suitable recombinant vectors that contain a polynucleotide such as described hereinbefore are 
disclosed elsewhere in the specification. 
25 Polynucleotide Constructs 

The terms "polynucleotide construct" and "recombinant polynucleotide" are used 
interchangeably herein to refer to linear or circular, purified or isolated polynucleotides that have 
been artificially designed and which comprise at least two nucleotide sequences that are not found as 
contiguous nucleotide sequences in their initial natural environment. 

30 DNA Construct That Enables Directing Temporal And Spatial BAP28 Gene 

Expression In Recombinant Cell Hosts And In Transgenic Animals. 

In order to study the physiological and phenotypic consequences of a lack of synthesis of 
the BAP28 protein, both at the cell level and at the multi cellular organism level, the invention also 
encompasses DNA constructs and recombinant vectors enabling a conditional expression of a 
35 specific allele of the BAP28 genomic sequence or cDNA and also of a copy of this genomic 

sequence or cDNA harboring substitutions, deletions, or additions of one or more bases as regards to 
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the BAP28 nucleotide sequence of SEQ ID Nos 1-3, or a fragment thereof, these base substitutions, 
deletions or additions being located either in an exon, an intron or a regulatory sequence, but 
preferably in an exon of the BAP28 genomic sequence or within the BAP28 cDNA of SEQ ID No 2 
or 3. In a preferred embodiment, the BAP28 sequence comprises a biallelic marker of the present 
5 invention. In a preferred embodiment, the BAP28 sequence comprises a biallelic marker of the 
present invention, preferably one of the biallelic markers Al to A58, preferably Al to A27, A34, 
A37 to A41, A43 to A49, A52, and A54 to A58, more preferably one of the biallelic markers Al, 
A4, 16, A30, A31, A42, A50, A51, and A53. 

In an additional embodiment, the invention concerns a DNA construct comprising an exon 

10 of PCTA-1 selected from the group consisting of exons A, B, C, and D. 

The present invention embodies recombinant vectors comprising any one of the 
polynucleotides described in the present invention. More particularly, the polynucleotide constructs 
according to the present invention can comprise any of the polynucleotides described in the 
"Genomic Sequences Of The Human BAP28 Gene" section, the " BAP28 cDNA Sequences" section, 

1 5 the "Coding Regions" section, and the "Oligonucleotide Probes And Primers" section. 

A first preferred DNA construct is based on the tetracycline resistance operon tet from E. 
coli transposon TnlO for controlling the BAP28 gene expression, such as described by Gossen et 
al.(1992, 1995) and Furth et al.(1994). Such a DNA construct contains seven tet operator sequences 
from TnlO (tetop) that are fused to a minimal promoter, said minimal promoter being operably 

20 linked to a polynucleotide of interest that codes either for a sense or an antisense oligonucleotide or 
for a polypeptide, including a BAP28 polypeptide or a peptide fragment thereof. This DNA 
construct is functional as a conditional expression system for the nucleotide sequence of interest 
when the same cell also comprises a nucleotide sequence coding for either the wild type (tTA) or the 
mutant (rTA) repressor fused to the activating domain of viral protein VP16 of herpes simplex 

25 virus, placed under the control of a promoter, such as the HCMVIE1 enhancer/promoter or the 

MMTV-LTR. Indeed, a preferred DNA construct of the invention comprise both the polynucleotide 
containing the tet operator sequences and the polynucleotide containing a sequence coding for the 
tTA or the rTA repressor. 

In a specific embodiment, the conditional expression DNA construct contains the sequence 

30 encoding the mutant tetracycline repressor rTA, the expression of the polynucleotide of interest is 
silent in the absence of tetracycline and induced in its presence. 

DNA Constructs Allowing Homologous Recombination: Replacement Vectors 

A second preferred DNA construct will comprise, from 5'-end to 3 '-end: (a) a first 
nucleotide sequence that is comprised in the BAP28 genomic sequence; (b) a nucleotide sequence 
35 comprising a positive selection marker, such as the marker for neomycine resistance (neo); and (c) a 
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second nucleotide sequence that is comprised in the BAP28 genomic sequence, and is located on the 
genome downstream the first BAP28 nucleotide sequence (a). 

In a preferred embodiment, this DNA construct also comprises a negative selection marker 
located upstream the nucleotide sequence (a) or downstream the nucleotide sequence (c). 
5 Preferably, the negative selection marker consists of the thymidine kinase (tk) gene (Thomas et al., 
1986), the hygromycine beta gene (Te Riele et al., 1990), the hprt gene (Van der Lugt et al., 1991; 
Reid et al., 1990) or the Diphteria toxin A fragment (Dt-A) gene (Nada et al., 1993; Yagi et 
al.1990). Preferably, the positive selection marker is located within a BAP28 exon sequence so as to 
interrupt the sequence encoding a BAP28 protein. These replacement vectors are described, for 

10 example, by Thomas et al.(1986; 1987), Mansour et al.(1988) and Koller et al.(1992). 

The first and second nucleotide sequences (a) and (c) may be indifferently located within a 
BAP28 regulatory sequence, an intronic sequence, an exon sequence or a sequence containing both 
regulatory and/or intronic and/or exon sequences. The size of the nucleotide sequences (a) and (c) 
ranges from 1 to 50 kb, preferably from 1 to 10 kb, more preferably from 2 to 6 kb and most 

1 5 preferably from 2 to 4 kb. 

DNA Constructs Allowing Homologous Recombination: Cre-LoxP System. 

These new DNA constructs make use of the site specific recombination system of the PI 
phage. The PI phage possesses a recombinase called Cre which interacts specifically with a 34 base 
pairs loxP site. The loxP site is composed of two palindromic sequences of 13 bp separated by a 8 

20 bp conserved sequence (Hoess et al., 1986). The recombination by the Cre enzyme between two 
loxP sites having an identical orientation leads to the deletion of the DNA fragment. 

The Cre-/oxP system used in combination with a homologous recombination technique has 
been first described by Gu et al.(1993, 1994). Briefly, a nucleotide sequence of interest to be 
inserted in a targeted location of the genome harbors at least two loxP sites in the same orientation 

25 and located at the respective ends of a nucleotide sequence to be excised from the recombinant 
genome. The excision event requires the presence of the recombinase (Cre) enzyme within the 
nucleus of the recombinant cell host. The recombinase enzyme may be brought at the desired time 
either by (a) incubating the recombinant cell hosts in a culture medium containing this enzyme, by 
injecting the Cre enzyme directly into the desired cell, such as described by Araki et al.(1995), or by 

30 lipofection of the enzyme into the cells, such as described by Baubonis et al.(1993); (b) transfecting 
the cell host with a vector comprising the Cre coding sequence operably linked to a promoter 
functional in the recombinant cell host (in some embodiments, the promoter may be inducible), said 
vector being introduced in the recombinant cell host, such as described by Gu et al.(1993) and Sauer 
et al.(1988); (c) introducing in the genome of the cell host a polynucleotide comprising the Cre 

35 coding sequence operably linked to a promoter functional in the recombinant cell host (in some 

embodiments, the promoter may be inducible), and said polynucleotide being inserted in the genome 
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of the cell host either by a random insertion event or an homologous recombination event, such as 
described by Gu et al.(1994). 

In a specific embodiment, the vector containing the sequence to be inserted in the BAP28 
gene by homologous recombination is constructed in such a way that selectable markers are flanked 

5 by lox? sites of the same orientation, it is possible, by treatment by the Cre enzyme, to eliminate the 
selectable markers while leaving the BAP28 sequences of interest that have been inserted by an 
homologous recombination event. Again, two selectable markers are needed: a positive selection 
marker to select for the recombination event and a negative selection marker to select for the 
homologous recombination event. Vectors and methods using the Cre-faxP system are described by 

10 Zouetal.(1994). 

Thus, a third preferred DNA construct of the invention comprises, from 5 '-end to 3 '-end: 
(a) a first nucleotide sequence that is comprised in the BAP28 genomic sequence; (b) a nucleotide 
sequence comprising a polynucleotide encoding a positive selection marker, said nucleotide 
sequence comprising additionally two sequences defining a site recognized by a recombinase, such 

15 as a lox? site, the two sites being placed in the same orientation; and (c) a second nucleotide 
sequence that is comprised in the BAP28 genomic sequence, and is located on the genome 
downstream of the first BAP28 nucleotide sequence (a). 

The sequences defining a site recognized by a recombinase, such as a lox? site, are 
preferably located within the nucleotide sequence (b) at suitable locations bordering the nucleotide 

20 sequence for which the conditional excision is sought. In one specific embodiment, two lox? sites 
are located at each side of the positive selection marker sequence, in order to allow its excision at a 
desired time after the occurrence of the homologous recombination event. 

In a preferred embodiment of a method using the third DNA construct described above, the 
excision of the polynucleotide fragment bordered by the two sites recognized by a recombinase, 

25 preferably two loxP sites, is performed at a desired time, due to the presence within the genome of 
the recombinant host cell of a sequence encoding the Cre enzyme operably linked to a promoter 
sequence, preferably an inducible promoter, more preferably a tissue-specific promoter sequence and 
most preferably a promoter sequence which is both inducible and tissue- specific, such as described 
by Guetal.(1994). 

30 The presence of the Cre enzyme within the genome of the recombinant cell host may result 

of the breeding of two transgenic animals, the first transgenic animal bearing the BAP28-den\ed 
sequence of interest containing the lox? sites as described above and the second transgenic animal 
bearing the Cre coding sequence operably linked to a suitable promoter sequence, such as described 
byGuetal.(1994). 

3 5 Spatio-temporal control of the Cre enzyme expression may also be achieved with an 

adenovirus based vector that contains the Cre gene thus allowing infection of cells, or in vivo 
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infection of organs, for delivery of the Cre enzyme, such as described by Anton and Graham (1995) 
and Kanegae et al.(1995). 

The DNA constructs described above may be used to introduce a desired nucleotide 
sequence of the invention, preferably a BAP28 genomic sequence or a BAP 28 cDNA sequence, and 

5 most preferably an altered copy of a BAP28 genomic or cDNA sequence, within a predetermined 
location of the targeted genome, leading either to the generation of an altered copy of a targeted gene 
(knock-out homologous recombination) or to the replacement of a copy of the targeted gene by 
another copy sufficiently homologous to allow an homologous recombination event to occur (knock- 
in homologous recombination). In a specific embodiment, the DNA constructs described above may 

10 be used to introduce a BAP28 genomic sequence or a BAP28 cDNA sequence. In some 

embodiments, said sequence comprises at least one biallelic marker of the present invention, 
preferably at least one biallelic marker selected from the group consisting of A 1 to A58, preferably 
Al to A27, A34, A37 to A41, A43 to A49, A52, and A54 to A58, more preferably one of the 
biallelic markers Al, A4, 16, A30, A31, A42, A50, A51, and A53. 

1 5 Nuclear Antisense DNA Constructs 

Other compositions containing a vector of the invention comprising an oligonucleotide 
fragment of the nucleic sequence SEQ ID No 2 or 3, preferably a fragment including the start codon 
of the BAP28 gene, as an antisense tool that inhibits the expression of the corresponding BAP28 
gene or the expression of the PCTA-1 gene. Preferred methods using antisense polynucleotide 

20 according to the present invention are the procedures described by Sczakiel et al.(1995) or those 
described in PCT Application No WO 95/24223. 

Preferably, the antisense tools are chosen among the polynucleotides (15-200 bp long) that 
are complementary to the 5 'end or 3' end of the BAP28 mRNA. In one embodiment, a combination 
of different antisense polynucleotides complementary to different parts of the desired targeted gene 

25 are used. 

A preferred antisense according to the invention is a polynucleotide according to the 
invention, more particularly polynucleotides comprising a contiguous span of at least 12, 15, 18, 20, 
25, 30, 50, 80, 100, 1 50, 200, 250, 300, 350, 400, 450, 500, 600 or 1000 nucleotides of SEQ ID No 
1, or the complements thereof, wherein said contiguous span comprises at least 1, 2, 3, 5, or 10 

30 nucleotide positions of any one of the following ranges of nucleotide positions of SEQ ID No 1 : 
57386-27494, 58504-59354, 85947-86108, and 91259-91325. 

Preferred antisense polynucleotides according to the present invention are complementary 
to a sequence of the mRNAs ofBAP28 that contains either the translation initiation codon ATG or a 
splicing site. Further preferred antisense polynucleotides according to the invention are 

35 complementary of the splicing site of the BAP 28 mRNA. 
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The antisense nucleic acids should have a length and melting temperature sufficient to 
permit formation of an intracellular duplex having sufficient stability to inhibit the expression of the 
BAP28 mRNA in the duplex. Strategies for designing antisense nucleic acids suitable for use in 
gene therapy are disclosed in Green et al., (1986) and Izant and Weintraub, (1984), the disclosures of 
5 which are incorporated herein by reference. 

In some strategies, antisense molecules are obtained by reversing the orientation of the 
BAP28 coding region with respect to a promoter so as to transcribe the opposite strand from that 
which is normally transcribed in the cell. The antisense molecules may be transcribed using in vitro 
transcription systems such as those which employ T7 or SP6 polymerase to generate the transcript. 
1 0 Another approach involves transcription of BAP28 antisense nucleic acids in vivo by operably 
linking DNA containing the antisense sequence to a promoter in a suitable expression vector. 

Alternatively, suitable antisense strategies are those described by Rossi et al.(1991), in the 
International Applications Nos. WO 94/23026, WO 95/04141, WO 92/18522 and in the European 
Patent Application No EP 0 572 287 A2. 
15 Preferably, the antisense polynucleotides of the invention have a 3' polyadenylation signal 

that has been replaced with a self-cleaving ribozyme sequence, such that RNA polymerase II 
transcripts are produced without poly(A) at their 3' ends, these antisense polynucleotides being 
incapable of export from the nucleus, such as described by Liu et al.(1994). In a preferred 
embodiment, these BAP28 antisense polynucleotides also comprise, within the ribozyme cassette, a 
20 histone stem-loop structure to stabilize cleaved transcripts against 3' -5' exonucleolytic degradation, 
such as the structure described by Eckner et al.(1991). 

An alternative to the antisense technology that is used according to the present invention 
consists in using ribozymes that will bind to a target sequence via their complementary 
polynucleotide tail and that will cleave the corresponding RNA by hydrolyzing its target site 
25 (namely "hammerhead ribozymes"). Briefly, the simplified cycle of a hammerhead ribozyme 
consists of (1) sequence specific binding to the target RNA via complementary antisense 
sequences; (2) site-specific hydrolysis of the cleavable motif of the target strand; and (3) release of 
cleavage products, which gives rise to another catalytic cycle. Indeed, the use of long-chain 
antisense polynucleotide (at least 30 bases long) or ribozymes with long antisense arms are 
30 advantageous. A preferred delivery system for antisense ribozyme is achieved by covalently 

linking these antisense ribozymes to lipophilic groups or to use liposomes as a convenient vector. 
Preferred antisense ribozymes according to the present invention are prepared as described by 
Sczakiel et al.(1995), the specific preparation procedures being referred to in said article being 
herein incorporated by reference. 
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Oligonucleotide Probes And Primers 

Polynucleotides derived from the BAP28 gene are useful in order to detect the presence of 
at least a copy of a nucleotide sequence of SEQ ID Nos 1-3, or a fragment, complement, or variant 
thereof in a test sample. 

5 Preferred probes and primers of the invention include isolated, purified, or recombinant 

polynucleotides comprising a contiguous span of at least 12, 15, 18, 20, 25, 30, 50, 80, 100, 150, or 
200 nucleotides, to the extent that such a length is consistent with the lengths of the particular 
nucleotide position, of SEQ ID No 1 or the complement thereof, wherein said contiguous span 
comprises at least 1, 2, 3, 5, 10, 20, 30, 40 or 50 nucleotides selected from the group consisting of 

10 the following nucleotide positions of SEQ ID No 1: 4997-5076, 5371-5544, 6121-6337, 9877- 
10018, 11522-11623, 12521-12661, 13453-13664, 13824-13957, 15376-15478, 16855-16965, 
17378-17495, 18535-18642, 21446-21541, 21999-22087, 23036-23247, 23546-23667, 24270- 
24461, 26287-26470, 2661 1-26747, 28068-28260, 32540-32709, 331 12-33270, 34586-34828, 
35156-35287, 36660-36763, 36934-37077, 37803-37921, 38017-38138, 40365-40493, 42618- 

15 42848, 43452-43578, 44836-44999, 48223-48269, and 49656-49779. Particularly preferred probes 
and primers of the invention include isolated, purified, or recombinant polynucleotides comprising a 
contiguous span of at least 12, 15, 18, 20, 25, 30, 35, 40, 50, 60, 70, 80, 90, 100, 150, 200, 500, or 
1 000 a nucleotide of SEQ ID No 1 or the complements thereof, wherein said contiguous span 
comprises at least 1, 2, 3, 5, or 10 of the following nucleotide positions of SEQ ID No 1 : 1-50357, 

20 50499-50963, 51257-52147, 52299-53234, 53394-53553, 53689-53837, 53943-54028, 54198- 
54740, 54896-55753, 55913-57385, 57495-58503, 58828-85946, 59355-85946, 86169-91228, 
and/or 91852 to 97662. 

Particularly preferred embodiments of the invention include isolated, purified, or 
recombinant polynucleotides comprising a contiguous span of at least 12, 15, 18, 20, 25, 30, 35, 40, 

25 50, 60, 70, 80, 90, 100, 150, 200, 500, or 1000 nucleotides of a nucleic acid sequence selected from 
the group consisting of SEQ ID Nos 2 and 3 or the complements thereof, wherein said contiguous 
span comprises at least 1, 2, 3, 5, or 10 of nucleotide positions 1 to 4995 of SEQ ID No 2 or 3. 
Further embodiments of the invention include isolated, purified, or recombinant polynucleotides 
comprising a contiguous span of at least 12, 15, 18, 20, 25, 30, 35, 40, 50, 60, 70, 80, 90, 100, 150, 

30 200, 500, or 1 000 nucleotides of a nucleic acid sequence selected from the group consisting of SEQ 
ID Nos 2 and 3 or the complements thereof, wherein said contiguous span comprises at least 1, 2, 3, 
5, or 10 of the following nucleotide positions of SEQ ID No 2 or 3: 1 to 2033, 2160 to 2348, and 
2676 to 4995. 

Additional preferred probes and primers of the invention include isolated, purified, or 
35 recombinant polynucleotides comprising a contiguous span of at least 12, 15, 18, 20, 25, 30, 35, 40, 
50, 60, 70, 80, 90, 100, 150, 200, 500, or 1000 nucleotides of a nucleic acid sequence selected from 
the group consisting of SEQ ID Nos 1-3, or the complements thereof, wherein said contiguous span 
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comprises at least 1, 2, 3, 5, or 10 nucleotide positions of any one of the following ranges of 
nucleotide positions of: 

(a) SEQIDNo 1: 1-2500, 2501-5000, 5001-7500, 7501-10000, 10001-12500, 12501- 
15000, 15001-17500, 17501-20000, 20001-22500, 22501-25000, 25001-27500, 27501-30000, 

5 30001-32500, 32501-35000, 35001-37500, 37501-40000, 40001-42500, 42501-45000, 45001- 
47500, 47501-50000, 50001-50357, 50499-50963, 51257-52147, 52299-53234, 53394-53553, 
53689-53837, 53943-54028, 54198-54740, 54896-55753, 55913-57385, 57495-58503, 58828- 
85946, 59355-85946, 86169-91228, and/or 91852 to 97662; 

(b) SEQ ID No 2: 1 to 500, 501 to 1000, 1001 to 1500, 1501 to 2000, 2001 to 2500, 2501 
10 to 3000, 3001 to 3500, 3501 to 4000, 4001 to 4500, 4501 to 4995, 5000 to 5500, 5501 to 6000, 6001 

to 6500, and 6501 to 6782; and, 

(c) SEQ ID No 3: 1 to 500, 501 to 1000, 1001 to 1500, 1501 to 2000, 2001 to 2500, 2501 
to 3000, 3001 to 3500, 3501 to 4000, 4001 to 4500, 4501 to 4995, 5000 to 5500, 5501 to 6000, 6001 
to 6500, 6501 to 7000, 7001 to 7500, 7501 to 7932. 

15 Thus, the invention also relates to nucleic acid probes characterized in that they hybridize 

specifically, under the stringent hybridization conditions defined above, with a nucleic acid selected 
from the group consisting of the nucleotide sequences: 

a) 1-50357, 50499-50963, 51257-52147, 52299-53234, 53394-53553, 53689-53837, 
53943-54028, 54198-54740, 54896-55753, 55913-57385, 57495-58503, 58828-85946, 59355- 

20 85946, 86169-91228, and/or 91852 to 97662 of SEQ ID No 1 or a variant thereof or a sequence 
complementary thereto; or 

b) 1 to 4995 of SEQ ID No 2 or 3 or a variant thereof or a sequence complementary 
thereto; and, 

c) at least one of nucleotide ranges 1 to 2033, 2160 to 2348, 2676 to 4995 of SEQ ID No 2 
25 or 3, or a variant thereof or a sequence complementary thereto. 

Additionally, another preferred embodiment of a probe according to the invention consists 
of a nucleic acid comprising a biallelic marker selected from the group consisting of A 1 to A58 or 
the complements thereto, for which the respective locations in the sequence listing are provided in 
Table 2. Preferably, a probe according to the present invention consists of a nucleic acid comprising 

30 one of the biallelic markers Al to A27, A34, A37 to A41, A43 to A49, A52, and A54 to A58. More 
preferably, a probe according to the present invention consists of a nucleic acid comprising one of 
the biallelic markers Al, A4, 16, A30, A31, A42, A50, A51, and A53. 

In one embodiment the invention encompasses isolated, purified, and recombinant 
polynucleotides comprising, consisting of, or consisting essentially of a contiguous span of 8 to 50 

35 nucleotides of SEQ ID Nos 1 , 2, or 3 and the complement thereof, wherein said span includes a 

BAP28-related biallelic marker in said sequence; In some embodiments said .g^i^-related biallelic 
marker is selected from the group consisting of Al to A58, and the complements thereof, or the 
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biallelic markers in linkage disequilibrium therewith; In some embodiments said BAP28-rdated 
biallelic marker is selected from the group consisting of Al to A27, A34, A37 to A41, A43 to A49, 
A52, and A54 to A58, and the complements thereof, or the biallelic markers in linkage 
disequilibrium therewith; In some embodiments said BAP28-related biallelic marker is selected from 
5 the group consisting of Al, A4, 16, A30, A31, A42, A50, A51, and A53, and the complements 
thereof or the biallelic markers in linkage disequilibrium therewith; In some embodiments said 
contiguous span is 18 to 35 nucleotides in length and said biallelic marker is within 4 nucleotides of 
the center of said polynucleotide; In some embodiments, said polynucleotide consists of said 
contiguous span and said contiguous span is 25 nucleotides in length and said biallelic marker is at 
10 the center of said polynucleotide; In some embodiments, the 3' end of said contiguous span is 
present at the 3' end of said polynucleotide; In some embodiments, the 3' end of said contiguous 
span is located at the 3' end of said polynucleotide and said biallelic marker is present at the 3* end of 
said polynucleotide. In a preferred embodiment, said probes comprises, consists of, or consists 
essentially of a sequence selected from the following sequences: PI to P58, preferably PI to P27, 
15 P34, P37 to P41, P43 to P49, P52, and P54 to P58, and the complementary sequences thereto. 

In another embodiment the invention encompasses isolated, purified and recombinant 
polynucleotides comprising, consisting of, or consisting essentially of a contiguous span of 8 to 50 
nucleotides of SEQ ID Nos 1 , 2, or 3 or the complements thereof, wherein the 3' end of said 
contiguous span is located at the 3' end of said polynucleotide, and wherein the 3' end of said 
20 polynucleotide is located within 20 nucleotides upstream of a BAP28-re\ated biallelic marker in said 
sequence; In some embodiments, said BAP28-related biallelic marker is selected from the group 
consisting of Al to A58, and the complements thereof or the biallelic markers in linkage 
disequilibrium therewith; In some embodiments, said BAP28-related biallelic marker is selected 
from the group consisting of Al to A27, A34, A37 to A41, A43 to A49, A52, and A54 to A58, and 
25 the complements thereof, or the biallelic markers in linkage disequilibrium therewith; In some 
embodiments said BAP28-re\ated biallelic marker is selected from the group consisting of Al, A4, 
1 6, A30, A3 1, A42, A50, A5 1, and A53, and the complements thereof, or the biallelic markers in 
linkage disequilibrium therewith; optionally, In some embodiments, the 3' end of said polynucleotide 
is located 1 nucleotide upstream of said BAP28-re\ated biallelic marker in said sequence; In some 
30 embodiments, said polynucleotide consists essentially of a sequence selected from the following 
sequences: Dl to D58 and El to E58, preferably Dl to D27, D34, D37 to D41, D43 to D49, D52, 
D54 to D58, El to E27, E34, E37 to E41, E43 to E49, E52, and E54 to E58. 

In a further embodiment, the invention encompasses isolated, purified, or recombinant 
polynucleotides comprising, consisting of, or consisting essentially of a sequence selected from the 
35 following sequences: Bl to B38 and CI to C38, preferably Bl to B15, B22, B24, B25, B27 to 29, 
B32, B34 to B38, CI to C15, C22, C24, C25, C27 to 29, C32, and C34 to C38. 
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In an additional embodiment, the invention encompasses polynucleotides for use in 
hybridization assay, sequencing assays, and enzyme-based mismatch detection assays for 
determining the identity of the nucleotide at a &4P2S-related biallelic marker in SEQ ID No 1, or the 
complements thereof, as well as polynucleotides for use in amplifying segments of nucleotides 
5 comprising a AiP2S-related biallelic marker in SEQ ID No 1 or the complements thereof; In some 
embodiments, said BAP28-re\ated biallelic marker is selected from the group consisting of Al to 
A58, and the complements thereof, or the biallelic markers in linkage disequilibrium therewith; In 
some embodiments, said &4P2S-related biallelic marker is selected from the group consisting of Al 
to A27, A3 4, A3 7 to A41, A43 to A49, A52, and A54 to A58, and the complements thereof, or the 
10 biallelic markers in linkage disequilibrium therewith; In some embodiments, said BAP28-re\ated 
biallelic marker is selected from the group consisting of Al, A4, 16, A30, A3 1, A42, A50, A51, and 
A53, and the complements thereof, or the biallelic markers in linkage disequilibrium therewith. 

Furthermore, the present invention also concerns the use of the oligonucleotide probes and 
primers according to the invention for determining the identity of the nucleotide at a BAP28-reMed 
15 biallelic marker. The use of these oligonucleotides in diagnostic is contemplated. 

The formation of stable hybrids depends on the melting temperature (Tm) of the DNA. 
The Tm depends on the length of the primer or probe, the ionic strength of the solution and the G+C 
content. The higher the G+C content of the primer or probe, the higher is the melting temperature 
because G:C pairs are held by three H bonds whereas A:T pairs have only two. The GC content in 
20 the probes of the invention usually ranges between 10 and 75 %, preferably between 35 and 60 %, 
and more preferably between 40 and 55 %. 

A probe or a primer according to the invention has between 8 and 1000 nucleotides in 
length, or is specified to be at least 12, 15, 18, 20, 25, 35, 40, 50, 60, 70, 80, 100, 250, 500 or 1000 
nucleotides in length. More particularly, the length of these probes and primers can range from 8, 
25 10, 15, 20, or 30 to 100 nucleotides, preferably from 10 to 50, more preferably from 15 to 30 

nucleotides. Shorter probes and primers tend to lack specificity for a target nucleic acid sequence 
and generally require cooler temperatures to form sufficiently stable hybrid complexes with the 
template. Longer probes and primers are expensive to produce and can sometimes self-hybridize to 
form hairpin structures. The appropriate length for primers and probes under a particular set of 
30 assay conditions may be empirically determined by one of skill in the art. A preferred probe or 
primer consists of a nucleic acid comprising a polynucleotide selected from the group of the 
nucleotide sequences of PI to P58 and the complementary sequences thereto, Bl to B38 and CI to 
C38, Dl to D58, El to E58, for which the respective locations in the sequence listing are provided in 
Tables 1, 3, and 4, preferably a nucleic acid comprising a polynucleotide selected from the group of 
35 the nucleotide sequences of PI to P27, P34, P37 to P41, P43 to P49, P52, and P54 to P58, and the 
complementary sequences thereto, Bl to B15, B22, B24, B25, B27 to 29, B32, B34 to B38, CI to 
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C15, C22, C24, C25, C27 to 29, C32, C34 to C38, Dl to D27, D34, D37 to D41, D43 to D49, D52, 
D54 to D58, El to E27, E34, E37 to E41, E43 to E49, E52, and E54 to E58. 

The primers and probes can be prepared by any suitable method, including, for example, 
cloning and restriction of appropriate sequences and direct chemical synthesis by a method such as 
5 the phosphodiester method of Narang et al.(1979), the phosphodiester method of Brown et al.(1979), 
the diethylphosphoramidite method of Beaucage et al.(1981) and the solid support method described 
in EP 0 707 592. The disclosures of all these documents are incorporated herein by reference. 

Detection probes are generally nucleic acid sequences or uncharged nucleic acid analogs 
such as, for example peptide nucleic acids which are disclosed in International Patent Application 
10 WO 92/20702, morpholino analogs which are described in U.S. Patents Numbered 5,185,444; 
5,034,506 and 5,142,047. The probe may have to be rendered "non-extendable" in that additional 
dNTPs cannot be added to the probe. In and of themselves analogs usually are non-extendable and 
nucleic acid probes can be rendered non-extendable by modifying the 3' end of the probe such that 
the hydroxyl group is no longer capable of participating in elongation. For example, the 3' end of 
1 5 the probe can be functionalized with the capture or detection label to thereby consume or otherwise 
block the hydroxyl group. Alternatively, the 3' hydroxyl group simply can be cleaved, replaced or 
modified, U.S. Patent Application Serial No 07/049,061 filed April 19, 1993 describes 
modifications, which can be used to render a probe non-extendable. 

Any of the polynucleotides of the present invention can be labeled, if desired, by 
20 incorporating a label detectable by spectroscopic, photochemical, biochemical, immunochemical, 
or chemical means. For example, useful labels include radioactive substances ( 32 P, 35 S, 3 H, 125 I), 
fluorescent dyes (5-bromodesoxyuridin, fluorescein, acetylaminofluorene, digoxigenin) or biotin. 
Preferably, polynucleotides are labeled at their 3' and 5' ends. Examples of non-radioactive 
labeling of nucleic acid fragments are described in the French patent No FR-78 10975 or by Urdea 
25 et al (1988) or Sanchez-Pescador et al (1988). In addition, the probes according to the present 
invention may have structural characteristics such that they allow the signal amplification, such 
structural characteristics being, for example, branched DNA probes as those described by Urdea et 
al. in 1991 or in the European patent No EP 0 225 807 (Chiron). 

A label can also be used to capture the primer, so as to facilitate the immobilization of 
30 either the primer or a primer extension product, such as amplified DNA, on a solid support. A 

capture label is attached to the primers or probes and can be a specific binding member which forms 
a binding pair with the solid's phase reagent's specific binding member (e.g. biotin and 
streptavidin). Therefore depending upon the type of label carried by a polynucleotide or a probe, it 
may be employed to capture or to detect the target DNA. Further, it will be understood that the 
3 5 polynucleotides, primers or probes provided herein, may, themselves, serve as the capture label. For 
example, in the case where a solid phase reagent's binding member is a nucleic acid sequence, it 
may be selected such that it binds a complementary portion of a primer or probe to thereby 
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immobilize the primer or probe to the solid phase. In cases where a polynucleotide probe itself 
serves as the binding member, those skilled in the art will recognize that the probe will contain a 
sequence or "tail" that is not complementary to the target. In the case where a polynucleotide primer 
itself serves as the capture label, at least a portion of the primer will be free to hybridize with a 
5 nucleic acid on a solid phase. DNA Labeling techniques are well known to the skilled technician. 

The probes of the present invention are useful for a number of purposes. They can be 
notably used in Southern hybridization to genomic DNA. The probes can also be used to detect 
PCR amplification products. They may also be used to detect mismatches in the BAP28 gene or 
mRNA using other techniques. 
1 o Any of the polynucleotides, primers and probes of the present invention can be 

conveniently immobilized on a solid support. Solid supports are known to those skilled in the art 
and include the walls of wells of a reaction tray, test tubes, polystyrene beads, magnetic beads, 
nitrocellulose strips, membranes, microparticles such as latex particles, sheep (or other animal) red 
blood cells, duracytes and others. The solid support is not critical and can be selected by one skilled 
1 5 in the art. Thus, latex particles, microparticles, magnetic or non-magnetic beads, membranes, plastic 
tubes, walls of microtiter wells, glass or silicon chips, sheep (or other suitable animal's) red blood 
cells and duracytes are all suitable examples. Suitable methods for immobilizing nucleic acids on 
solid phases include ionic, hydrophobic, covalent interactions and the like. A solid support, as used 
herein, refers to any material which is insoluble, or can be made insoluble by a subsequent reaction. 
20 The solid support can be chosen for its intrinsic ability to attract and immobilize the capture reagent. 
Alternatively, the solid phase can retain an additional receptor which has the ability to attract and 
immobilize the capture reagent. The additional receptor can include a charged substance that is 
oppositely charged with respect to the capture reagent itself or to a charged substance conjugated to 
the capture reagent. As yet another alternative, the receptor molecule can be any specific binding 
25 member which is immobilized upon (attached to) the solid support and which has the ability to 
immobilize the capture reagent through a specific binding reaction. The receptor molecule enables 
the indirect binding of the capture reagent to a solid support material before the performance of the 
assay or during the performance of the assay. The solid phase thus can be a plastic, derivatized 
plastic, magnetic or non-magnetic metal, glass or silicon surface of a test tube, microtiter well, sheet, 
30 bead, microparticle, chip, sheep (or other suitable animal's) red blood cells, duracytes® and other 
configurations known to those of ordinary skill in the art. The polynucleotides of the invention can 
be attached to or immobilized on a solid support individually or in groups of at least 2, 5, 8, 10, 12, 
15, 20, or 25 distinct polynucleotides of the invention to a single solid support. In addition, 
polynucleotides other than those of the invention may be attached to the same solid support as one or 
35 more polynucleotides of the invention. 

Consequently, the invention also deals with a method for detecting the presence of a 
nucleic acid comprising a nucleotide sequence selected from a group consisting of SEQ ID Nos 1-4, 
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9-13, a fragment or a variant thereof and a complementary sequence thereto in a sample, said 
method comprising the following steps of: 

a) bringing into contact a nucleic acid probe or a plurality of nucleic acid probes which can 
hybridize with a nucleotide sequence included in a nucleic acid selected form the group consisting of 

5 the nucleotide sequences of SEQ ID Nos 1 -4, 9- 1 3 , a fragment or a variant thereof and a 
complementary sequence thereto and the sample to be assayed; and 

b) detecting the hybrid complex formed between the probe and a nucleic acid in the 

sample. 

The invention further concerns a kit for detecting the presence of a nucleic acid comprising 
10 a nucleotide sequence selected from a group consisting of SEQ ID Nos 1-4, 9-13, a fragment or a 
variant thereof and a complementary sequence thereto in a sample, said kit comprising: 

a) a nucleic acid probe or a plurality of nucleic acid probes which can hybridize with a 
nucleotide sequence included in a nucleic acid selected form the group consisting of the nucleotide 
sequences of SEQ ID Nos 1-4, 9-13, a fragment or a variant thereof and a complementary sequence 

15 thereto; and 

b) in some embodiments, the kit also comprises reagents necessary for performing the 
hybridization reaction. 

In a first preferred embodiment of this detection method and kit, said nucleic acid probe or 
the plurality of nucleic acid probes are labeled with a detectable molecule. In a second preferred 

20 embodiment of said method and kit, said nucleic acid probe or the plurality of nucleic acid probes 
has been immobilized on a substrate In a third preferred embodiment, the nucleic acid probe or the 
plurality of nucleic acid probes comprise either a sequence which is selected from the group 
consisting of the nucleotide sequences of PI to P58 and the complementary sequences thereto, Bl to 
B38, CI to C38, Dl to D58, El to E58 or a biallelic marker selected from the group consisting of Al 

25 to A5 8 and the complements thereto, preferably a nucleic acid comprising a polynucleotide selected 
from the group of the nucleotide sequences of PI to P27, P34, P37 to P41, P43 to P49, P52, and P54 
to P58, and the complementary sequences thereto, Bl to B15, B22, B24, B25, B27 to 29, B32, B34 
to B38, CI to C15, C22, C24, C25, C27 to 29, C32, C34 to C38, Dl to D27, D34, D37 to D41, D43 
to D49, D52, D54 to D58, El to E27, E34, E37 to E41, E43 to E49, E52, and E54 to E58, or a 

30 biallelic marker selected from the group consisting of Al to A27, A34, A37 to A41, A43 to A49, 
A52, and A54 to A58, and the complements thereof. 

Oligonucleotide Arrays 

A substrate comprising a plurality of oligonucleotide primers or probes of the invention 
may be used either for detecting or amplifying targeted sequences in the BAP28 gene and may also 
35 be used for detecting mutations in the coding or in the non-coding sequences of the BAP28 gene. 
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Any polynucleotide provided herein may be attached in overlapping areas or at random 
locations on the solid support. Alternatively the polynucleotides of the invention may be attached in 
an ordered array wherein each polynucleotide is attached to a distinct region of the solid support 
which does not overlap with the attachment site of any other polynucleotide. Preferably, such an 
5 ordered array of polynucleotides is designed to be "addressable" where the distinct locations are 
recorded and can be accessed as part of an assay procedure. Addressable polynucleotide arrays 
typically comprise a plurality of different oligonucleotide probes that are coupled to a surface of a 
substrate in different known locations. The knowledge of the precise location of each 
polynucleotides location makes these "addressable" arrays particularly useful in hybridization 
10 assays. Any addressable array technology known in the art can be employed with the 

polynucleotides of the invention. One particular embodiment of these polynucleotide arrays is 
known as the Genechips™, and has been generally described in US Patent 5,143,854; PCT 
publications WO 90/15070 and 92/10092. These arrays may generally be produced using 
mechanical synthesis methods or light directed synthesis methods which incorporate a combination 
15 of photolithographic methods and solid phase oligonucleotide synthesis (Fodor et al., 1991). The 
immobilization of arrays of oligonucleotides on solid supports has been rendered possible by the 
development of a technology generally identified as "Very Large Scale Immobilized Polymer 
Synthesis" (VLSIPS™) in which, typically, probes are immobilized in a high density array on a 
solid surface of a chip. Examples of VLSIPS™ technologies are provided in US Patents 5,143,854; 
20 and 5,412,087 and in PCT Publications WO 90/15070, WO 92/10092 and WO 95/1 1995, which 
describe methods for forming oligonucleotide arrays through techniques such as light-directed 
synthesis techniques. In designing strategies aimed at providing arrays of nucleotides immobilized 
on solid supports, further presentation strategies were developed to order and display the 
oligonucleotide arrays on the chips in an attempt to maximize hybridization patterns and sequence 
25 information. Examples of such presentation strategies are disclosed in PCT Publications WO 
94/12305, WO 94/11530, WO 97/29212 and WO 97/31256. 

In another embodiment of the oligonucleotide arrays of the invention, an oligonucleotide 
probe matrix may advantageously be used to detect mutations occurring in the BAP28 gene and in its 
regulatory region. For this particular purpose, probes are specifically designed to have a nucleotide 
30 sequence allowing their hybridization to the genes that carry known mutations (either by deletion, 
insertion or substitution of one or several nucleotides). By known mutations, it is meant, mutations 
on the BAP28 gene that have been identified according, for example to the technique used by Huang 
et al.(1996) or Samson et al.(1996). 

Another technique that is used to detect mutations in the BAP28 gene is the use of a high- 
35 density DNA array. Each oligonucleotide probe constituting a unit element of the high density DNA 
array is designed to match a specific subsequence of the BAP28 genomic DNA or cDNA. Thus, an 
array consisting of oligonucleotides complementary to subsequences of the target gene sequence is 
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used to determine the identity of the target sequence with the wild gene sequence, measure its 
amount, and detect differences between the target sequence and the reference wild gene sequence of 
the BAP 28 gene. In one such design, termed 4L tiled array, is implemented a set of four probes (A, 
C, G, T), preferably 15-nucleotide oligomers. In each set of four probes, the perfect complement 

5 will hybridize more strongly than mismatched probes. Consequently, a nucleic acid target of length 
L is scanned for mutations with a tiled array containing 4L probes, the whole probe set containing all 
the possible mutations in the known wild reference sequence. The hybridization signals of the 15- 
mer probe set tiled array are perturbed by a single base change in the target sequence. As a 
consequence, there is a characteristic loss of signal or a "footprint" for the probes flanking a 
10 mutation position. This technique was described by Chee et al. in 1996, which is herein 
incorporated by reference. 

Consequently, the invention concerns an array of nucleic acid molecules comprising at 
least one polynucleotide described above as probes and primers. Preferably, the invention concerns 
an array of nucleic acid comprising at least two polynucleotides described above as probes and 

15 primers. 

A further object of the invention consists of an array of nucleic acid sequences comprising 
either at least one of the sequences selected from the group consisting of PI to P58, Bl to B38, CI to 
C38, Dl to D58, El to E58, the sequences complementary thereto, a fragment thereof of at least 8, 
10, 12, 15, 18, 20, 25, 30, or 40 consecutive nucleotides thereof, or at least one sequence comprising 
20 a biallelic marker selected from the group consisting of Al to A58 and the complements thereto, 
preferably either at least one of the sequences selected from the group consisting of PI to P27, P34, 
P37 to P41, P43 to P49, P52, P54 to P58, Bl to B15, B22, B24, B25, B27 to 29, B32, B34 to B38, 
CI to C15, C22, C24, C25, C27 to 29, C32, C34 to C38, Dl to D27, D34, D37 to D41, D43 to D49, 
D52, D54 to D58, El to E27, E34, E37 to E41, E43 to E49, E52, and E54 to E58, or at least one 
25 sequence comprising a biallelic marker selected from the group consisting of A 1 to A27, A3 4, A3 7 
to A41, A43 to A49, A52, and A54 to A58, and the complements thereof. 

The invention also pertains to an array of nucleic acid sequences comprising either at least 
two of the sequences selected from the group consisting of PI to P58, Bl to B38, CI to C38, Dl to 
D58, El to E58, the sequences complementary thereto, a fragment thereof of at least 8 consecutive 
30 nucleotides thereof, or at least two sequences comprising a biallelic marker selected from the group 
consisting of Al to A58 and the complements thereof, preferably either at least two of the sequences 
selected from the group consisting of PI to P27, P34, P37 to P41, P43 to P49, P52, P54 to P58, Bl 
to B15, B22, B24, B25, B27 to 29, B32, B34 to B38, CI to CI 5, C22, C24, C25, C27 to 29, C32, 
C34 to C38, Dl to D27, D34, D37 to D41, D43 to D49, D52, D54 to D58, El to E27, E34, E37 to 
35 E4 1 , E43 to E49, E52, and E54 to E5 8 or at least two sequences comprising a biallelic marker 
selected from the group consisting of Al to A27, A34, A37 to A41, A43 to A49, A52, and A54 to 
A58, and the complements thereof. 

45 



GENSET.063AUS PATENT 
Amplification of the BAP28 gene. 

1. DNA extraction 

As for the source of the genomic DNA to be subjected to analysis, any test sample can be 
foreseen without any particular limitation. These test samples include biological samples which can 

5 be tested by the methods of the present invention described herein and include human and animal 
body fluids such as whole blood, serum, plasma, cerebrospinal fluid, urine, lymph fluids, and 
various external secretions of the respiratory, intestinal and genitourinary tracts, tears, saliva, milk, 
white blood cells, myelomas and the like; biological fluids such as cell culture supernatants; fixed 
tissue specimens including tumor and non-tumor tissue and lymph node tissues; bone marrow 

1 0 aspirates and fixed cell specimens. The preferred source of genomic DNA used in the context of the 
present invention is from peripheral venous blood of each donor. 

The techniques of DNA extraction are well-known to the skilled technician. Such 
techniques are described notably by Mackey et al. (1998). 

2. DNA amplification 

1 5 DNA amplification techniques are well-known to those skilled in the art. Amplification 

techniques that can be used in the context of the present invention include, but are not limited to, the 
ligase chain reaction (LCR) described in EP-A- 320 308, WO 9320227 and EP-A-439 182, the 
disclosures of which are incorporated herein by reference, the polymerase chain reaction (PCR, RT- 
PCR) and techniques such as the nucleic acid sequence based amplification (NASBA) described in 

20 Guatelli JC, et al. (1990) and in Compton J. (1991), Q-beta amplification as described in European 
Patent Application no 4544610, strand displacement amplification as described in Walker et al. 
(1996) and EP A 684 315 and, target mediated amplification as described in PCT Publication WO 
9322461, the disclosure of which is incorporated herein by reference. 

LCR and Gap LCR are exponential amplification techniques, both depend on DNA ligase 

25 to join adjacent primers annealed to a DNA molecule. In Ligase Chain Reaction (LCR), probe pairs 
are used which include two primary (first and second) and two secondary (third and fourth) probes, 
all of which are employed in molar excess to target. The first probe hybridizes to a first segment of 
the target strand and the second probe hybridizes to a second segment of the target strand, the first 
and second segments being contiguous so that the primary probes abut one another in 5' phosphate- 

30 3 'hydroxyl relationship, and so that a ligase can covalently fuse or ligate the two probes into a fused 
product. In addition, a third (secondary) probe can hybridize to a portion of the first probe and a 
fourth (secondary) probe can hybridize to a portion of the second probe in a similar abutting fashion. 
Of course, if the target is initially double stranded, the secondary probes also will hybridize to the 
target complement in the first instance. Once the ligated strand of primary probes is separated from 
35 the target strand, it will hybridize with the third and fourth probes which can be ligated to form a 
complementary, secondary ligated product. It is important to realize that the ligated products are 
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functionally equivalent to either the target or its complement. By repeated cycles of hybridization 
and ligation, amplification of the target sequence is achieved. A method for multiplex LCR has also 
been described (WO 9320227). Gap LCR (GLCR) is a version of LCR where the probes are not 
adjacent but are separated by 2 to 3 bases. 
5 For amplification of mRNAs, it is within the scope of the present invention to reverse 

transcribe mRNA into cDNA followed by polymerase chain reaction (RT-PCR); or, to use a single 
enzyme for both steps as described in U.S. Patent No 5,322,770 or, to use Asymmetric Gap LCR 
(RT-AGLCR) as described by Marshall et al. (1994). AGLCR is a modification of GLCR that 
allows the amplification of RNA. 

1 0 The PCR technology is the preferred amplification technique used in the present invention. 

A variety of PCR techniques are familiar to those skilled in the art. For a review of PCR 
technology, see White (1997) and the publication entitled "PCR Methods and Applications" (1991, 
Cold Spring Harbor Laboratory Press). In each of these PCR procedures, PCR primers on either 
side of the nucleic acid sequences to be amplified are added to a suitably prepared nucleic acid 

15 sample along with dNTPs and a thermostable polymerase such as Taq polymerase, Pfu polymerase, 
or Vent polymerase. The nucleic acid in the sample is denatured and the PCR primers are 
specifically hybridized to complementary nucleic acid sequences in the sample. The hybridized 
primers are extended. Thereafter, another cycle of denaturation, hybridization, and extension is 
initiated. The cycles are repeated multiple times to produce an amplified fragment containing the 

20 nucleic acid sequence between the primer sites. PCR has further been described in several patents 
including US Patents 4,683,195, 4,683,202 and 4,965,188. Each of these publications is 
incorporated by reference. 

One of the aspects of the present invention is a method for the amplification of the human 
BAP28 gene, particularly of the genomic sequences of SEQ ID No 1 or of the cDNA sequence of 

25 SEQ ID No 2, or a fragment or a variant thereof in a test sample, preferably using the PCR 

technology. The method comprises the steps of contacting a test sample suspected of containing the 
target BAP28 encoding sequence or portion thereof with amplification reaction reagents comprising 
a pair of amplification primers, and eventually in some instances a detection probe that can hybridize 
with an internal region of amplicon sequences to confirm that the desired amplification reaction has 

30 taken place. 

Thus, the present invention also relates to a method for the amplification of a human 
BAP28 gene sequence, particularly of a portion of the genomic sequences of SEQ ID No 1 or of the 
cDNA sequence of SEQ ID No 2, 3 or 4, or a variant thereof in a test sample, said method 
comprising the steps of: 

35 a) contacting a test sample suspected of containing the targeted BAP 28 gene sequence 

comprised in a nucleotide sequence selected from a group consisting of SEQ ID Nos 1-4, or 
fragments or variants thereof with amplification reaction reagents comprising a pair of amplification 
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primers as described above and located on either side of the polynucleotide region to be amplified; 
and 

b) in some embodiments, the method also comprises detecting the amplification products. 
The invention also concerns a kit for the amplification of a human BAP28 gene sequence, 
5 particularly of a portion of the genomic sequence of SEQ ID No 1 or of the cDNA sequence of SEQ 
ID No 2, 3 or 4, or a variant thereof in a test sample, wherein said kit comprises: 

a) a pair of oligonucleotide primers located on either side of the BAP28 region to be 
amplified; and 

b) in some embodiments, the kit also comprises the reagents necessary for performing the 
10 amplification reaction. 

In a first preferred embodiment of the above amplification method or kit, the amplification 
product is detected by hybridization with a labeled probe having a sequence which is complementary 
to the amplified region. In another embodiment of the above amplification method and kit, primers 
comprise a sequence which is selected from the group consisting of the nucleotide sequences of Bl 

15 to B38, CI to C38, Dl to D58, and El to E58. preferably Bl to B15, B22, B24, B25, B27 to 29, 
B32, B34 to B38, CI to CI 5, C22, C24, C25, C27 to 29, C32, C34 to C38, Dl to D27, D34, D37 to 
D41, D43 to D49, D52, D54 to D58, El to E27, E34, E37 to E41, E43 to E49, E52, and E54 to E58 

The primers are more particularly characterized in that they have sufficient 
complementarity with any sequence of a strand of the genomic sequence close to the region to be 

20 amplified, for example with a non-coding sequence adjacent to exons to amplify. 

BAP28 Proteins and Polypeptide Fragments: 
The BAP28 protein has 2144 amino acids in length. This protein is highly conserved in 
various species such as Drosophila melanogaster, Arabidopsis thaliana, Schizosaccahromyces 
pombe, Caenorhabditis elegans, Saccharomyces cerevisiae and Tetraodon nigroviridis . The protein 

25 alignment between the human BAP28 and the proteins from Drosophila melanogaster, Arabidopsis 
thaliana, Schizosaccahromyces pombe, Caenorhabditis elegans, Saccharomyces cerevisiae is 
disclosed in the Figure 3. The protein alignment between the human BAP28 and the protein from 
Tetraodon nigroviridis is disclosed in the Figure 4. The BAP28 protein is also well conserved 
among the mammalian. Indeed, several ESTs with a good homolgy with the human BAP28 have 

30 been identified. Some examples of ESTs are the following (Genbank Accession Number/ species) : 
AW423202/zebrafish ; AW481398/Bos taurus ; AW325866/Bos taurus ; AW353291/Bos taurus ; 
AW315340/Bos taurus ; AA681616/mouse ; AV120680/Mus musculus ; and, D77458/ mouse. 

Analysis of the BAP28 protein sequence provided several potential phosphorylation sites 
and N-glycolsylation sites in BAP28. More particularly, protein kinase C phosphorylation sites have 

35 been identified in positions 199-201, 269-271, 387-389, 415-417, 508-510, 650-652, 717-719, 778- 
780, 792-794, 884-886, 903-905,999-1001, 1091-1093, 1349-1351, 1506-1508, 1573-1575, 1614- 
1616, 1632-1634, 1673-1675, 1743-1745, 1808-1810, 1829-1831, 191 1-1913, and 2077-2079 of 
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SEQ ID No4; casein kinase II phosphorylation sites have been identified in positions 22-25, 50-53, 
253-256, 363-366,408-411, 409-412, 508-511, 539-542, 590-593, 689-692, 717-720, 745-748, 961- 
964, 979-982, 1091-1094, 1105-1108, 1195-1198, 1492-1495, 1723-1726, 1882-1885, 1972-1975, 
and 1981-1984 of SEQ ID No4. Otherwise, several potential N-glycosylation sites have been 
5 identified in positions 93-96, 154-157, 776-779, 882-885, 1347-1350, 1488-1491, 1630-1633, 1746- 
1749, and 1970-1973 of SEQ ID No 5. A conserved HEATJREPEAT motif has been identified in 
positions 2106-2139 of SEQ ID No 5. The HEATJREPEAT motif are generally involved in protein- 
protein interaction. The PCT application W098/12327 showed that BAP28 should be involved in 
interaction with BRCAL 
10 The term "BAP28 polypeptides" is used herein to embrace all of the proteins and 

polypeptides of the present invention. Also forming part of the invention are polypeptides encoded 
by the polynucleotides of the invention, as well as fusion polypeptides comprising such 
polypeptides. The invention embodies BAP28 proteins from humans, including isolated or purified 
BAP28 proteins consisting, consisting essentially, or comprising the sequence of SEQ ID No 5 or 
1 5 fragments thereof. The present invention also embodies isolated, purified, and recombinant 
polypeptides comprising a contiguous span of at least 6 amino acids, preferably at least 8 to 10 
amino acids, more preferably at least 12, 15, 20, 25, 30, 40, 50, or 100 amino acids of SEQ ID No 5, 
wherein said contiguous span includes at least 1, 2, 3, 5 or 10 of the amino acid positions 1 to 1629 
of the SEQ ID No 5. The present invention also embodies isolated, purified, and recombinant 
20 polypeptides comprising a contiguous span of at least 6 amino acids, preferably at least 8 to 10 

amino acids, more preferably at least 12, 15, 20, 25, 30, 40, 50, or 100 amino acids of SEQ ID No 5, 
wherein said contiguous span include an amino acid selected from the group consisting of an 
asparagine at the amino acid position 1694 of SEQ ID No 5, a valine at the amino acid position 1854 
of SEQ ID No 5, an asparagine at the amino acid position 1967 of SEQ ID No 5, a glutamic acid at 
25 the amino acid position 2017 of SEQ ID No 5, and an alanine at the amino acid position 2050 of 
SEQ ID No 5. In other preferred embodiments the BAP28 protein contains an alanine residue at 
amino acid position 2050 in SEQ ID No 5. 

Four biallelic markers of the present invention, namely A16, A19, A21 and A25, provide 
an amino acid sequence change. Indeed, the biallelic marker A16 encodes a Ser or Asn residue at the 
30 position 1694 of the BAP28 protein; the biallelic marker A19 encodes a Ala or Val residue at the 
position 1854 of the BAP28 protein; the biallelic marker A21 encodes a Asp or Asn at the position 
1967 of the BAP28 protein; and the biallelic marker A25 encodes a Gly or Glu at the position 2017 
of the BAP28 protein. The invention encompasses the BAP28 proteins comprising all the 
combinations of the above-described residues at the positions 1694, 1854, 1967, and 2017. 
35 The variant protein and fragments thereof which contain an asparagine at the amino acid 

position 1694 of SEQ ID No 5 are collectively referred to herein as "1694- Asn variants". The variant 
protein and fragments thereof which contain a valine at the amino acid position 1854 of SEQ ID No 
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5 are collectively referred to herein as "1854-Val variants". The variant protein and fragments 
thereof which contain an asparagine at the amino acid position 1967 of SEQ ID No 5 are 
collectively referred to herein as "1967-Asn variants". The variant protein and fragments thereof 
which contain a glutamic acid at the amino acid position 2017 of SEQ ID No 5 are collectively 
5 referred to herein as "201 7-Glu variants". The variant protein and fragments thereof which contain 
an alanine at the amino acid position 2050 of SEQ ID No 5 are collectively referred to herein as 
"2050- Ala variants". In other preferred embodiments of the polypeptides of the present invention, 
the contiguous stretch of amino acids comprises the site of a mutation or functional mutation, 
including a deletion, addition, swap or truncation of the amino acids in the BAP28 protein sequence. 

10 The invention also encompasses a purified, isolated, or recombinant polypeptide 

comprising an amino acid sequence having at least 70, 75, 80, 85, 90, 95, 98 or 99% amino acid 
identity with the amino acid sequence of SEQ ID No 5 or a fragment thereof. 

The invention concerns the polypeptide which are encoded by a nucleic acid comprising a 
sequence selected from the group consisting of the sequence SEQ ID Nos 1-3 or fragments thereof. 

15 BAP28 proteins are preferably isolated from human or mammalian tissue samples or 

expressed from human or mammalian genes. The BAP28 polypeptides of the invention can be made 
using routine expression methods known in the art. The polynucleotide encoding the desired 
polypeptide is ligated into an expression vector suitable for any convenient host. Both eukaryotic 
and prokaryotic host systems may be used in forming recombinant polypeptides, and a summary of 

20 some of the more common systems. The polypeptide is then isolated from lysed cells or from the 
culture medium and purified to the extent needed for its intended use. Purification is by any 
technique known in the art, for example, differential extraction, salt fractionation, chromatography, 
centrifugation, and the like. See, for example, Methods in Enzymology for a variety of methods for 
purifying proteins. 

25 In addition, shorter protein fragments is produced by chemical synthesis. Alternatively the 

proteins of the invention is extracted from cells or tissues of humans or non-human animals. 
Methods for purifying proteins are known in the art, and include the use of detergents or chaotropic 
agents to disrupt particles followed by differential extraction and separation of the polypeptides by 
ion exchange chromatography, affinity chromatography, sedimentation according to density, and gel 

30 electrophoresis. 

Any BAP28 cDNA, including SEQ ID Nos 2 and 3, or fragments thereof is used to express 
BAP28 proteins and polypeptides. The nucleic acid encoding the BAP28 protein or fragments thereof 
to be expressed is operably linked to a promoter in an expression vector using conventional cloning 
technology. The BAP28 insert in the expression vector may comprise the full coding sequence for the 
35 BAP28 protein or a portion thereof. For example, the BAP28 derived insert may encode a polypeptide 
comprising at least 10 consecutive amino acids of the BAP28 protein of SEQ ID No 5, wherein said 
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contiguous span includes at least 1, 2, 3, 5 or 10 of the amino acid positions 1 to 1629 of the SEQ ID 
No 5, or wherein polypeptide is a 2050- Ala variant BAP28 polypeptide. 

The expression vector is any of the mammalian, yeast, insect or bacterial expression systems 
known in the art. Commercially available vectors and expression systems are available from a variety 
5 of suppliers including Genetics Institute (Cambridge, MA), Stratagene (La Jolla, California), Promega 
(Madison, Wisconsin), and Invitrogen (San Diego, California). If desired, to enhance expression and 
facilitate proper protein folding, the codon context and codon pairing of the sequence is optimized for 
the particular expression organism in which the expression vector is introduced, as explained by 
Hatfield, et al., U.S. Patent No 5,082,767. 
1 0 In one embodiment, the entire coding sequence of the BAP28 cDN A through the poly A 

signal of the cDNA are operably linked to a promoter in the expression vector. Alternatively, if the 
nucleic acid encoding a portion of the BAP28 protein lacks a methionine to serve as the initiation site, 
an initiating methionine can be introduced next to the first codon of the nucleic acid using conventional 
techniques. Similarly, if the insert from the BAP28 cDNA lacks a poly A signal, this sequence can be 
1 5 added to the construct by, for example, splicing out the Poly A signal from pSG5 (Stratagene) using 
Bgll and Sail restriction endonuclease en:zymes and incorporating it into the mammalian expression 
vector pXTl (Stratagene). pXTl contains the LTRs and a portion of the gag gene from Moloney 
Murine Leukemia Virus. The position of the LTRs in the construct allow efficient stable transfection. 
The vector includes the Herpes Simplex Thymidine Kinase promoter and the selectable neomycin gene. 
20 The nucleic acid encoding the BAP28 protein or a portion thereof is obtained by PCR from a bacterial 
vector containing the BAP28 cDNA of SEQ ID No 2 or 3 using oligonucleotide primers complementary 
to the BAP28 cDNA or portion thereof and containing restriction endonuclease sequences for Pst I 
incorporated into the 5'primer and BgMI at the 5' end of the corresponding cDNA 3' primer, taking care 
to ensure that the sequence encoding the BAP28 protein or a portion thereof is positioned properly with 
25 respect to the poly A signal. The purified fragment obtained from the resulting PCR reaction is digested 
with PstI, blunt ended with an exonuclease, digested with Bgl II, purified and ligated to pXTl, now 
containing a poly A signal and digested with Bglll. 

The ligated product is transfected into mouse NTH 3T3 cells using Lipofectin (Life 
Technologies, Inc., Grand Island, New York) under conditions outlined in the product specification. 
30 Positive transfectants are selected after growing the transfected cells in 600ug/ml G41 8 (Sigma, St. 
Louis, Missouri). 

Alternatively, the nucleic acids encoding the BAP28 protein or a portion thereof is cloned into 
pED6dpc2 (Genetics Institute, Cambridge, MA). The resulting pED6dpc2 constructs is transfected into 
a suitable host cell, such as COS 1 cells. Methotrexate resistant cells are selected and expanded. 
3 5 The above procedures may also be used to express a mutant BAP28 protein responsible for a 

detectable phenotype or a portion thereof. 
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The expressed proteins are purified using conventional purification techniques such as 
ammonium sulfate precipitation or chromatographic separation based on size or charge. The protein 
encoded by the nucleic acid insert may also be purified using standard immunochromatography 
techniques. In such procedures, a solution containing the expressed BAP28 protein or portion thereof, 
5 such as a cell extract, is applied to a column having antibodies against the BAP28 protein or portion 
thereof is attached to the chromatography matrix. The expressed protein is allowed to bind the 
immunochromatography column. Thereafter, the column is washed to remove non-specifically bound 
proteins. The specifically bound expressed protein is then released from the column and recovered 
using standard techniques. 

1 0 To confirm expression of the BAP28 protein or a portion thereof, the proteins expressed from 

host cells containing an expression vector containing an insert encoding the BAP28 protein or a portion 
thereof can be compared to the proteins expressed in host cells containing the expression vector without 
an insert. The presence of a band in samples from cells containing the expression vector with an insert 
which is absent in samples from cells containing the expression vector without an insert indicates that 

1 5 the BAP28 protein or a portion thereof is being expressed. Generally, the band will have the mobility 
expected for the BAP28 protein or portion thereof. However, the band may have a mobility different 
than that expected as a result of modifications such as glycosylation, ubiquitination, or enzymatic 
cleavage. 

Antibodies capable of specifically recognizing the expressed BAP28 protein or a portion 

20 thereof are described below. 

If antibody production is not possible, the nucleic acids encoding the BAP28 protein or a 
portion thereof is incorporated into expression vectors designed for use in purification schemes 
employing chimeric polypeptides. In such strategies the nucleic acid encoding the BAP28 protein or a 
portion thereof is inserted in frame with the gene encoding the other half of the chimera. The other half 

25 of the chimera is p-globin or a nickel binding polypeptide encoding sequence. A chromatography 

matrix having antibody to (3-globin or nickel attached thereto is then used to purify the chimeric protein. 
Protease cleavage sites is engineered between the |3-globin gene or the nickel binding polypeptide and 
the BAP28 protein or portion thereof. Thus, the two polypeptides of the chimera is separated from one 
another by protease digestion. 

3 o One useful expression vector for generating p-globin chimerics is pSG5 (Stratagene), which 

encodes rabbit (3-globin. Intron II of the rabbit p-globin gene facilitates splicing of the expressed 
transcript, and the polyadenylation signal incorporated into the construct increases the level of 
expression. These techniques are well known to those skilled in the art of molecular biology. Standard 
methods are published in methods texts such as Davis et al., (1986) and many of the methods are 

35 available from Stratagene, Life Technologies, Inc., or Promega. Polypeptide may additionally be 

produced from the construct using in vitro translation systems such as the In vitro Express™ Translation 
Kit (Stratagene). 
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Antibodies That Bind BAP28 Polypeptides of the Invention 

Any BAP28 polypeptide or whole protein may be used to generate antibodies capable of 
specifically binding to expressed BAP28 protein or fragments thereof as described. The antibody 
compositions of the invention are capable of specifically binding or specifically bind to the BAP28 
5 protein. For an antibody composition to specifically bind to the BAP28 protein it must demonstrate 
at least a 5%, 10%, 15%, 20%, 25%, 50%, or 100% greater binding affinity for full length BAP28 
protein than for any full length protein in an ELISA, RIA, or other antibody-based binding assay. 
For an antibody composition to specifically bind to the 1694-Asn, 1854-Val, 1967-Asn, 2017-GIu, 
or 2050- Ala variant BAP28 protein, it must demonstrate at least a 5%, 10%, 15%, 20%, 25%, 50%, 

10 or 100% greater binding affinity for full length 1694-Asn, 1854-Val, 1967-Asn, 2017-Glu, or 2050- 
Ala variant BAP28 protein than for respectively a 1694-Ser, 1854- Ala, 1967- Asp, 2017-Gly or 
2050-Val full length protein in an ELISA, RIA, or other antibody-based binding assay. The present 
invention also contemplates the antibodies which are specific of a protein BAP28 comprising one 
combination of the above-described residues at the positions 1694, 1854, 1967, and 2017. 

15 In a preferred embodiment of the invention antibody compositions are capable of 

selectively binding, or selectively bind to an epitope-containing fragment of a polypeptide 
comprising a contiguous span of at least 6 amino acids, preferably at least 8 to 10 amino acids, more 
preferably at least 12, 15, 20, 25, 30, 40, 50, or 100 amino acids of SEQ ID No 5, wherein said 
epitope comprises at least 1, 2, 3, 5 or 10 of the amino acid positions selected from the group 

20 consisting of 1 to 1629 and 2050 of SEQ ID No 5, wherein said antibody composition is optionally 
either polyclonal or monoclonal. In a other preferred embodiment, antibody compositions are 
capable of selectively binding, or selectively bind to an epitope-containing fragment of a polypeptide 
comprising a contiguous span of at least 6 amino acids, preferably at least 8 to 1 0 amino acids, more 
preferably at least 12, 15, 20, 25, 30, 40, 50, or 100 amino acids of SEQ ID No 5, wherein said 

25 epitope comprises an amino acid selected from the group consisting of an asparagine at the amino 
acid position 1 694 of SEQ ID No 5, a valine at the amino acid position 1 854 of SEQ ID No 5, an 
asparagine at the amino acid position 1967 of SEQ ID No 5, a glutamic acid at the amino acid 
position 2017 of SEQ ID No 5, and an alanine at the amino acid position 2050 of SEQ ID No 5, 
wherein said antibody composition is optionally either polyclonal or monoclonal. 

30 The present invention also contemplates the use of polypeptides comprising a contiguous 

span of at least 6 amino acids, preferably at least 8 to 10 amino acids, more preferably at least 12, 
15, 20, 25, 50, or 100 amino acids of a BAP28 polypeptide in the manufacture of antibodies, 
wherein said contiguous span comprises at least 1, 2, 3, 5 or 10 of the amino acid positions selected 
from the group consisting of 1 to 1629 of SEQ ID No 5. The present invention further contemplates 

35 the use of polypeptides comprising a contiguous span of at least 6 amino acids, preferably at least 8 
to 10 amino acids, more preferably at least 12, 15, 20, 25, 50, or 100 amino acids of a BAP28 
polypeptide in the manufacture of antibodies, wherein said contiguous span comprises an amino acid 
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selected from the group consisting of an asparagine at the amino acid position 1694 of SEQ ID No 5, 
a valine at the amino acid position 1854 of SEQ ID No 5, an asparagine at the amino acid position 
1967 of SEQ ID No 5, a glutamic acid at the amino acid position 2017 of SEQ ID No 5, and an 
alanine at the amino acid position 2050 of SEQ ID No 5. In a preferred embodiment such 
5 polypeptides are useful in the manufacture of antibodies to detect the presence and absence of the 
BAP28 protein. 

Non-human animals or mammals, whether wild-type or transgenic, which express a 
different species of BAP28 than the one to which antibody binding is desired, and animals which do 
not express BAP28 (i.e. a BAP28 knock out animal as described in herein) are particularly useful for 

1 0 preparing antibodies. BAP28 knock out animals will recognize all or most of the exposed regions of 
BAP28 as foreign antigens, and therefore produce antibodies with a wider array of BAP28 epitopes. 
Moreover, smaller polypeptides with only 1 0 to 30 amino acids may be useful in obtaining specific 
binding to the BAP28 protein. In addition, the humoral immune system of animals which produce a 
species of BAP28 that resembles the antigenic sequence will preferentially recognize the differences 

1 5 between the animal's native BAP28 species and the antigen sequence, and produce antibodies to 
these unique sites in the antigen sequence. Such a technique will be particularly useful in obtaining 
antibodies that specifically bind to the BAP28 protein. 

Antibody preparations prepared according to either protocol are useful in quantitative 
immunoassays which determine concentrations of antigen-bearing substances in biological samples; 

20 they are also used semi-quantitatively or qualitatively to identify the presence of antigen in a biological 
sample. The antibodies may also be used in therapeutic compositions for killing cells expressing the 
protein or reducing the levels of the protein in the body. 

The antibodies of the invention may be labeled, either by a radioactive, a fluorescent or an 
enzymatic label. 

25 Consequently, the invention is also directed to a method for detecting specifically the 

presence of a human BAP28 polypeptide according to the invention in a biological sample, said 
method comprising the following steps: 

a) bringing into contact the biological sample with a polyclonal or monoclonal antibody 
directed against the BAP28 polypeptide of the amino acid sequence of SEQ ID No 5, or to a peptide 

30 fragment or variant thereof; 

b) detecting the antigen-antibody complex formed. 

The invention also concerns a diagnostic kit for detecting in vitro the presence of a human 
BAP28 polypeptide according to the present invention in a biological sample, wherein said kit 
comprises : 

35 a) a polyclonal or monoclonal antibody directed against the BAP28 polypeptide of the 

amino acid sequence of SEQ ID No 5, or to a peptide fragment or variant thereof. In some 
embodiments, the antibody may be labeled; 
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b) a reagent allowing the detection of the antigen-antibody complexes formed, said reagent 
optionally being labelled, or being able to be recognized itself by a labeled reagent, more particularly 
in the case when the above-mentioned monoclonal or polyclonal antibody is not labeled by itself. 

BAP28 -related Biallelic Markers 

5 Advantages Of The Biallelic Markers Of The Present Invention 

The BAP28-re\ated biallelic markers of the present invention offer a number of important 
advantages over other genetic markers such as RFLP (Restriction fragment length polymorphism) 
and VNTR (Variable Number of Tandem Repeats) markers. 

The first generation of markers, were RFLPs, which are variations that modify the length 

10 of a restriction fragment. But methods used to identify and to type RFLPs are relatively wasteful of 
materials, effort, and time. The second generation of genetic markers were VNTRs, which can be 
categorized as either minisatellites or microsatellites. Minisatellites are tandemly repeated DNA 
sequences present in units of 5-50 repeats which are distributed along regions of the human 
chromosomes ranging from 0. 1 to 20 kilobases in length. Since they present many possible alleles, 

15 their informative content is very high. Minisatellites are scored by performing Southern blots to 
identify the number of tandem repeats present in a nucleic acid sample from the individual being 
tested. However, there are only 10 potential VNTRs that can be typed by Southern blotting. 
Moreover, both RFLP and VNTR markers are costly and time-consuming to develop and assay in 
large numbers. 

20 Single nucleotide polymorphism or biallelic markers can be used in the same manner as 

RFLPs and VNTRs but offer several advantages. SNP are densely spaced in the human genome and 
represent the most frequent type of variation. An estimated number of more than 10 7 sites are 
scattered along the 3x1 0 9 base pairs of the human genome. Therefore, SNP occur at a greater 
frequency and with greater uniformity than RFLP or VNTR markers which means that there is a 

25 greater probability that such a marker will be found in close proximity to a genetic locus of interest. 
SNP are less variable than VNTR markers but are mutationally more stable. 

Also, the different forms of a characterized single nucleotide polymorphism, such as the 
biallelic markers of the present invention, are often easier to distinguish and can therefore be typed 
easily on a routine basis. Biallelic markers have single nucleotide based alleles and they have only 

30 two common alleles, which allows highly parallel detection and automated scoring. The biallelic 
markers of the present invention offer the possibility of rapid, high throughput genotyping of a large 
number of individuals. 

Biallelic markers are densely spaced in the genome, sufficiently informative and can be 
assayed in large numbers. The combined effects of these advantages make biallelic markers 

35 extremely valuable in genetic studies. Biallelic markers can be used in linkage studies in families, in 
allele sharing methods, in linkage disequilibrium studies in populations, in association studies of 
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case-control populations or of trait positive and trait negative populations. An important aspect of 
the present invention is that biallelic markers allow association studies to be performed to identify 
genes involved in complex traits. Association studies examine the frequency of marker alleles in 
unrelated case- and control-populations and are generally employed in the detection of polygenic or 

5 sporadic traits. Association studies may be conducted within the general population and are not 
limited to studies performed on related individuals in affected families (linkage studies). Biallelic 
markers in different genes can be screened in parallel for direct association with disease or response 
to a treatment. This multiple gene approach is a powerful tool for a variety of human genetic studies 
as it provides the necessary statistical power to examine the synergistic effect of multiple genetic 

10 factors on a particular phenotype, drug response, sporadic trait, or disease state with a complex 
genetic etiology. 

Although most valuable in association studies, the biallelic markers of the present 
invention can have a wide range of uses, and may for example also be used in forensic identification 
of individual humans, such as for identification of descendants, determination of paternity, criminal 

15 identification, and the like. For example, a DNA sample is obtained from a person or from a cellular 
sample (e.g. , crime scene evidence such as blood, saliva, semen, and the like) and the identity of the 
allele present at any one or preferably multiple biallelic markers is determined according to any of 
the detection methods described herein. On the basis of the allele(s) present at the specified 
positions, the individual from which the sample originated will be identified with respect to his/her 

20 genotype. The biallelic markers of the invention may be used alone or in conjunction with other 
genetic markers, including RFLP and VNTR to conclusively identify an individual or to rule out the 
individual as a possible perpetrator. 

BAP28-Related Biallelic Markers And Polynucleotides Related Thereto 

The invention also concerns BAP 2 8-r elated biallelic markers. A portion of the biallelic 
25 markers of the present invention designated Al to A58 are disclosed in Table 2, including their 
location on the BAP28 gene. These biallelic markers are also each listed as a single base 
polymorphism in the features of SEQ ID No 1 . 

The invention also relates to a purified and/or isolated nucleotide sequence comprising a 
polymorphic base of a ^P^S-related biallelic marker, preferably of a biallelic marker selected from 
30 the group consisting of Al to A58, more preferably one of the biallelic markers Al to A27, A34, 
A3 7 to A41, A43 to A49, A52, and A54 to A58, still more preferably one of the biallelic markers 
Al, A4, 16, A30, A31, A42, A50, A51, and A53, and the complements thereof. The sequence has 
between 8 and 1000 nucleotides in length, and preferably comprises at least 8, 10, 12, 15, 18, 20, 25, 
35, 40, 50, 60, 70, 80, 100, 250, 500 or 1000 contiguous nucleotides of a nucleotide sequence 
3 5 selected from the group consisting of SEQ ID Nos 1 , 2 or 3, or a variant thereof or a complementary 
sequence thereto. These nucleotide sequences comprise the polymorphic base of either allele 1 or 
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allele 2 of the respective biallelic marker. In some embodiments, said biallelic marker may be 
within 6, 5, 4, 3, 2, or 1 nucleotides of the center of) said polynucleotide or at the center of said 
polynucleotide. In some embodiments, the 3' end of said contiguous span may be present at the 3' 
end of said polynucleotide. In some embodiments^, a iL4P28-related biallelic marker biallelic marker 
5 may be present at the 3' end of said polynucleotide. In some embodiments, the 3' end of said 

polynucleotide may be located within or at least 2, 4, 6, 8, 10, 12, 15, 18, 20, 25, 50, 100, 250, 500, 
or 1 000 nucleotides upstream of a BAP28-related biallelic marker in said sequence. In some 
embodiments, the 3' end of said polynucleotide may be located 1 nucleotide upstream of a BAP28- 
related biallelic marker in said sequence. In some embodiments, said polynucleotide may further 

10 comprise a label. In some embodiments, said polynucleotide can be attached to solid support. In a 
further embodiment, the polynucleotides defined above can be used alone or in any combination. 

The invention further concerns a nucleic acid encoding the BAP28 protein, wherein said 
nucleic acid comprises a polymorphic base of a biallelic marker selected from the group consisting 
of Al to A58 and the complements thereof, preferably Al to A27, A34, A37 to A41, A43 to A49, 

15 A52,andA54toA58. 

The invention also encompasses the use of any polynucleotide for, or any polynucleotide 
for use in, determining the identity of one or more nucleotides at a BAP28-ve\ated biallelic marker. 
In addition, the polynucleotides of the invention for use in determining the identity of one or more 
nucleotides at a &4P2S-related biallelic marker encompass polynucleotides with any further 

20 limitation described in this disclosure, or those following, specified alone or in any combination. In 
some embodiments, said BAP28-re\ated biallelic marker is selected from the group consisting of Al 
to A58, and the complements thereof, or the biallelic markers in linkage disequilibrium therewith; 
In some embodiments, said BAP28-related biallelic marker is selected from the group consisting of 
Al to A27, A34, A37 to A41, A43 to A49, A52, and A54 to A58, and the complements thereof, or 

25 the biallelic markers in linkage disequilibrium therewith; In some embodiments, said iL4P28-related 
biallelic marker is selected from the group consisting of Al, A4, 16, A30, A31, A42, A50, A51, and 
A53, and the complements thereof, or the biallelic markers in linkage disequilibrium therewith; In 
some embodiments, said polynucleotide may comprise a sequence disclosed in the present 
specification; In some embodiments, said polynucleotide may comprise, consist of, or consist 

30 essentially of any polynucleotide described in the present specification; In some embodiments, said 
determining may be performed in a hybridization assay, sequencing assay, microsequencing assay, 
or an enzyme-based mismatch detection assay; In some embodiments, said polynucleotide may be 
attached to a solid support, array, or addressable array; In some embodiments, said polynucleotide 
may be labeled. A preferred polynucleotide may be used in a hybridization assay for determining 

35 the identity of the nucleotide at a ZL4P2S-related biallelic marker. Another preferred polynucleotide 
may be used in a sequencing or microsequencing assay for determining the identity of the nucleotide 
at a BAP28-related biallelic marker. A third preferred polynucleotide may be used in an enzyme- 



57 



GENSET.063AUS PATENT 
based mismatch detection assay for determining the identity of the nucleotide at a J R4P2S-related 
biallelic marker. A fourth preferred polynucleotide may be used in amplifying a segment of 
polynucleotides comprising a P/4P2S-related biallelic marker. In some embodiments, any of the 
polynucleotides described above may be attached to a solid support, array, or addressable array; In 
5 some embodiments, said polynucleotide may be labeled. 

Additionally, the invention encompasses the use of any polynucleotide for, or any 
polynucleotide for use in, amplifying a segment of nucleotides comprising a &4P2S-related biallelic 
marker. In addition, the polynucleotides of the invention for use in amplifying a segment of 
nucleotides comprising a BAP 2 8-related biallelic marker encompass polynucleotides with any 
1 0 further limitation described in this disclosure, or those following, specified alone or in any 

combination: In some embodiments, said J &4P2S-related biallelic marker is selected from the group 
consisting of A 1 to A58, and the complements thereof, or the biallelic markers in linkage 
disequilibrium therewith; In some embodiments, wherein said &4P28-related biallelic marker is 
selected from the group consisting of Al to A27, A34, A37 to A41, A43 to A49, A52, and A54 to 
15 A58, and the complements thereof, or the biallelic markers in linkage disequilibrium therewith; In 
some embodiments, said &4P2S-related biallelic marker is selected from the group consisting of Al, 
A4, 16, A30, A31, A42, A50, A51, and A53, and the complements thereof, or the biallelic markers 
in linkage disequilibrium therewith; In some embodiments, said polynucleotide may comprise a 
sequence disclosed in the present specification; In some embodiments, said polynucleotide may 
20 comprise, consist of, or consist essentially of any polynucleotide described in the present 

specification; In some embodiments, said amplifying may be performed by a PCR or LCR. In some 
embodiments, said polynucleotide may be attached to a solid support, array, or addressable array. In 
some embodiments, said polynucleotide may be labeled. 

The primers for amplification or sequencing reaction of a polynucleotide comprising a 
25 biallelic marker of the invention may be designed from the disclosed sequences for any method 
known in the art. A preferred set of primers are fashioned such that the 3' end of the contiguous 
span of identity with a sequence selected from the group consisting of SEQ ID Nos 1, 2 or 3, or a 
sequence complementary thereto or a variant thereof is present at the 3' end of the primer. Such a 
configuration allows the 3' end of the primer to hybridize to a selected nucleic acid sequence and 
30 dramatically increases the efficiency of the primer for amplification or sequencing reactions. Allele 
specific primers may be designed such that a polymorphic base of a biallelic marker is at the 3' end 
of the contiguous span and the contiguous span is present at the 3' end of the primer. Such allele 
specific primers tend to selectively prime an amplification or sequencing reaction so long as they are 
used with a nucleic acid sample that contains one of the two alleles present at a biallelic marker. 
35 The 3' end of the primer of the invention may be located within or at least 2, 4, 6, 8, 10, 12, 15, 18, 
20, 25, 50, 100, 250, 500, or 1000 nucleotides upstream of a BAP28-re\ated biallelic marker in said 
sequence or at any other location which is appropriate for their intended use in sequencing, 
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amplification or the location of novel sequences or markers. Thus, another set of preferred 
amplification primers comprise an isolated polynucleotide consisting essentially of a contiguous 
span of 8 to 50 nucleotides in a sequence selected from the group consisting of SEQ IDNos 1, 2 or 3 
or a sequence complementary thereto or a variant thereof, wherein the 3' end of said contiguous span 
5 is located at the 3 'end of said polynucleotide, and wherein the 3 'end of said polynucleotide is 
located upstream of a BAP28-re\ated biallelic marker in said sequence. Preferably, those 
amplification primers comprise a sequence selected from the group consisting of the sequences Bl 
to B38 and CI to C38, preferably Bl to B15, B22, B24, B25, B27 to 29, B32, B34 to B38, CI to 
CI 5, C22, C24, C25, C27 to 29, C32, and C34 to C38. Primers with their 3' ends located 1 
1 0 nucleotide upstream of a biallelic marker of BAP28 have a special utility as microsequencing assays. 
Preferred microsequencing primers are described in Table 4. In some embodiments, 
microsequencing primers are selected from the group consisting of the nucleotide sequences Dl to 
D58 and El to E58, preferably Dl to D27, D34, D37 to D41, D43 to D49, D52, D54 to D58, El to 
E27, E34, E37 to E41, E43 to E49, E52, and E54 to E58. 
1 5 The probes of the present invention may be designed from the disclosed sequences for any 

method known in the art, particularly methods which allow for testing if a marker disclosed herein is 
present. A preferred set of probes may be designed for use in the hybridization assays of the 
invention in any manner known in the art such that they selectively bind to one allele of a biallelic 
marker, but not the other under any particular set of assay conditions. Preferred hybridization 
20 probes comprise the polymorphic base of either allele 1 or allele 2 of the considered biallelic marker. 
In some embodiments, said biallelic marker may be within 6, 5, 4, 3, 2, or 1 nucleotides of the center 
of the hybridization probe or at the center of said probe. In a preferred embodiment, the probes are 
selected in the group consisting of the sequences PI to P58 and the complementary sequence thereto 
(Table 3), preferably PI to P27, P34, P37 to P41, P43 to P49, P52, and P54 to P58. 
25 It should be noted that the polynucleotides of the present invention are not limited to 

having the exact flanking sequences surrounding the polymorphic bases which are enumerated in 
Sequence Listing. Rather, it will be appreciated that the flanking sequences surrounding the biallelic 
markers may be lengthened or shortened to any extent compatible with their intended use and the 
present invention specifically contemplates such sequences. The flanking regions outside of the 
30 contiguous span need not be homologous to native flanking sequences which actually occur in 

human subjects. The addition of any nucleotide sequence which is compatible with the nucleotides 
intended use is specifically contemplated. 

Primers and probes may be labeled or immobilized on a solid support as described in 
"Oligonucleotide probes and primers". The polynucleotides of the invention which are attached to a 
35 solid support encompass polynucleotides with any further limitation described in this disclosure, or 
those following, specified alone or in any combination: In some embodiments, said polynucleotides 
may be specified as attached individually or in groups of at least 2, 5, 8, 10, 12, 15, 20, or 25 distinct 
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polynucleotides of the invention to a single solid support. In some embodiments, polynucleotides 
other than those of the invention may attached to the same solid support as polynucleotides of the 
invention. In some embodiments, when multiple polynucleotides are attached to a solid support they 
may be attached at random locations, or in an ordered array. In some embodiments, said ordered 
5 array may be addressable. 

The present invention also encompasses diagnostic kits comprising one or more 
polynucleotides of the invention with a portion or all of the necessary reagents and instructions for 
genotyping a test subject by determining the identity of a nucleotide at a BAP28-re\ated biallelic 
marker. The polynucleotides of a kit may optionally be attached to a solid support, or be part of an 
1 0 array or addressable array of polynucleotides. The kit may provide for the determination of the 
identity of the nucleotide at a marker position by any method known in the art including, but not 
limited to, a sequencing assay method, a microsequencing assay method, a hybridization assay 
method, or an enzyme-based mismatch detection assay method. 

Methods For De Novo Identification Of Biallelic Markers 
1 5 Any of a variety of methods can be used to screen a genomic fragment for single 

nucleotide polymorphisms such as differential hybridization with oligonucleotide probes, detection 
of changes in the mobility measured by gel electrophoresis or direct sequencing of the amplified 
nucleic acid. A preferred method for identifying biallelic markers involves comparative sequencing 
of genomic DNA fragments from an appropriate number of unrelated individuals. 
20 In a first embodiment, DNA samples from unrelated individuals are pooled together, 

following which the genomic DNA of interest is amplified and sequenced. The nucleotide 
sequences thus obtained are then analyzed to identify significant polymorphisms. One of the major 
advantages of this method resides in the fact that the pooling of the DNA samples substantially 
reduces the number of DNA amplification reactions and sequencing reactions, which must be carried 
25 out. Moreover, this method is sufficiently sensitive so that a biallelic marker obtained thereby 
usually demonstrates a sufficient frequency of its less common allele to be useful in conducting 
association studies. 

In a second embodiment, the DNA samples are not pooled and are therefore amplified and 
sequenced individually. This method is usually preferred when biallelic markers need to be 

30 identified in order to perform association studies within candidate genes. Preferably, highly relevant 
gene regions such as promoter regions or exon regions may be screened for biallelic markers. A 
biallelic marker obtained using this method may show a lower degree of informativeness for 
conducting association studies, e.g. if the frequency of its less frequent allele may be less than about 
10%. Such a biallelic marker will, however, be sufficiently informative to conduct association 

35 studies and it will further be appreciated that including less informative biallelic markers in the 

genetic analysis studies of the present invention, may allow in some cases the direct identification of 
causal mutations, which may, depending on their penetrance, be rare mutations. 
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The following is a description of the various parameters of a preferred method used by the 
inventors for the identification of the biallelic markers of the present invention. 



Genomic DNA Samples 

The genomic DNA samples from which the biallelic markers of the present invention are 

5 generated are preferably obtained from unrelated individuals corresponding to a heterogeneous 
population of known ethnic background. The number of individuals from whom DNA samples are 
obtained can vary substantially, preferably from about 10 to about 1000, preferably from about 50 to 
about 200 individuals. It is usually preferred to collect DNA samples from at least about 100 
individuals in order to have sufficient polymorphic diversity in a given population to identify as 

1 0 many markers as possible and to generate statistically significant results. 

As for the source of the genomic DNA to be subjected to analysis, any test sample can be 
foreseen without any particular limitation. These test samples include biological samples, which can 
be tested by the methods of the present invention described herein, and include human and animal 
body fluids such as whole blood, serum, plasma, cerebrospinal fluid, urine, lymph fluids, and 

1 5 various external secretions of the respiratory, intestinal and genitourinary tracts, tears, saliva, milk, 
white blood cells, myelomas and the like; biological fluids such as cell culture supernatants; fixed 
tissue specimens including tumor and non-tumor tissue and lymph node tissues; bone marrow 
aspirates and fixed cell specimens. The preferred source of genomic DNA used in the present 
invention is from peripheral venous blood of each donor. Techniques to prepare genomic DNA 

20 from biological samples are well known to the skilled technician. Details of a preferred embodiment 
are provided in Example 1 . The person skilled in the art can choose to amplify pooled or unpooled 
DNA samples. 

DNA Amplification 

The identification of biallelic markers in a sample of genomic DNA may be facilitated 
25 through the use of DNA amplification methods. DNA samples can be pooled or unpooled for the 
amplification step. DNA amplification techniques are well known to those skilled in the art. 
Various methods to amplify DNA fragments carrying biallelic markers are further described 
hereinbefore in "Amplification of the BAP28 gene". The PCR technology is the preferred 
amplification technique used to identify new biallelic markers. A typical example of a PCR reaction 
30 suitable for the purposes of the present invention is provided in Example 2. 

In a first embodiment of the present invention, biallelic markers are identified using 
genomic sequence information generated by the inventors. Sequenced genomic DNA fragments are 
used to design primers for the amplification of 500 bp fragments. These 500 bp fragments are 
amplified from genomic DNA and are scanned for biallelic markers. Primers may be designed using 
35 the OSP software (Hillier L. and Green P., 1991). All primers may contain, upstream of the specific 
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target bases, a common oligonucleotide tail that serves as a sequencing primer. Those skilled in the 
art are familiar with primer extensions, which can be used for these purposes. 

Preferred primers, useful for the amplification of genomic sequences encoding the 
candidate genes, focus on promoters, exons and splice sites of the genes. A biallelic marker presents 
5 a higher probability to be an eventual causal mutation if it is located in these functional regions of 
the gene. Preferred amplification primers of the invention include the nucleotide sequences Bl to 
B38 and CI to C38, preferably Bl to B15, B22, B24, B25, B27 to 29, B32, B34 to B38, CI to C15, 
C22, C24, C25, C27 to 29, C32, and C34 to C38, detailed further in Example 2, Table 1 . 

Sequencing Of Amplified Genomic DNA And Identification Of Single Nucleotide 
10 Polymorphisms 

The amplification products generated as described above, are then sequenced using any 
method known and available to the skilled technician. Methods for sequencing DNA using either 
the dideoxy-mediated method (Sanger method) or the Maxam-Gilbert method are widely known to 
those of ordinary skill in the art. Such methods are for example disclosed in Sambrook et al.(1989). 
1 5 Alternative approaches include hybridization to high-density DNA probe arrays as described in Chee 
etal.(1996). 

Preferably, the amplified DNA is subjected to automated dideoxy terminator sequencing 
reactions using a dye-primer cycle sequencing protocol. The products of the sequencing reactions 
are run on sequencing gels and the sequences are determined using gel image analysis. The 

20 polymorphism search is based on the presence of superimposed peaks in the electrophoresis pattern 
resulting from different bases occurring at the same position. Because each dideoxy terminator is 
labeled with a different fluorescent molecule, the two peaks corresponding to a biallelic site present 
distinct colors corresponding to two different nucleotides at the same position on the sequence. 
However, the presence of two peaks can be an artifact due to background noise. To exclude such an 

25 artifact, the two DNA strands are sequenced and a comparison between the peaks is carried out. In 
order to be registered as a polymorphic sequence, the polymorphism has to be detected on both 
strands. 

The above procedure permits those amplification products, which contain biallelic markers 
to be identified. The detection limit for the frequency of biallelic polymorphisms detected by 

30 sequencing pools of 100 individuals is approximately 0. 1 for the minor allele, as verified by 
sequencing pools of known allelic frequencies. However, more than 90% of the biallelic 
polymorphisms detected by the pooling method have a frequency for the minor allele higher than 
0.25. Therefore, the biallelic markers selected by this method have a frequency of at least 0.1 for the 
minor allele and less than 0.9 for the major allele. Preferably at least 0.2 for the minor allele and less 

35 than 0.8 for the major allele, more preferably at least 0.3 for the minor allele and less than 0.7 for the 
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major allele, thus a heterozygosity rate higher than 0.18, preferably higher than 0.32, more 
preferably higher than 0.42. 

In another embodiment, biallelic markers are detected by sequencing individual DNA 
samples, the frequency of the minor allele of such a biallelic marker may be less than 0.1 . 

5 Validation Of The Biallelic Markers Of The Present Invention 

The polymorphisms are evaluated for their usefulness as genetic markers by validating that 
both alleles are present in a population. Validation of the biallelic markers is accomplished by 
genotyping a group of individuals by a method of the invention and demonstrating that both alleles 
are present. Microsequencing is a preferred method of genotyping alleles. The validation by 

10 genotyping step may be performed on individual samples derived from each individual in the group 
or by genotyping a pooled sample derived from more than one individual. The group can be as 
small as one individual if that individual is heterozygous for the allele in question. Preferably the 
group contains at least three individuals, more preferably the group contains five or six individuals, 
so that a single validation test will be more likely to result in the validation of more of the biallelic 

1 5 markers that are being tested. It should be noted, however, that when the validation test is 

performed on a small group it may result in a false negative result if as a result of sampling error 
none of the individuals tested carries one of the two alleles. Thus, the validation process is less 
useful in demonstrating that a particular initial result is an artifact, than it is at demonstrating that 
there is a bona fide biallelic marker at a particular position in a sequence. All of the genotyping, 

20 haplotyping, association, and interaction study methods of the invention may optionally be 
performed solely with validated biallelic markers. 

Evaluation Of The Frequency Of The Biallelic Markers Of The Present Invention 

The validated biallelic markers are further evaluated for their usefulness as genetic markers 
by determining the frequency of the least common allele at the biallelic marker site. The higher the 

25 frequency of the less common allele the greater the usefulness of the biallelic marker is association 
and interaction studies. The determination of the least common allele is accomplished by 
genotyping a group of individuals by a method of the invention and demonstrating that both alleles 
are present. This determination of frequency by genotyping step may be performed on individual 
samples derived from each individual in the group or by genotyping a pooled sample derived from 

30 more than one individual. The group must be large enough to be representative of the population as 
a whole. Preferably the group contains at least 20 individuals, more preferably the group contains at 
least 50 individuals, most preferably the group contains at least 100 individuals. Of course the larger 
the group the greater the accuracy of the frequency determination because of reduced sampling error. 
A biallelic marker wherein the frequency of the less common allele is 30% or more is termed a "high 

35 quality biallelic marker." All of the genotyping, haplotyping, association, and interaction study 
methods of the invention may optionally be performed solely with high quality biallelic markers. 
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Methods For Genotyping An Individual For Biallelic Markers 

Methods are provided to genotype a biological sample for one or more biallelic markers of 
the present invention, all of which may be performed in vitro. Such methods of genotyping 
comprise determining the identity of a nucleotide at a BAP28 biallelic marker site by any method 

5 known in the art. These methods find use in genotyping case-control populations in association 
studies as well as individuals in the context of detection of alleles of biallelic markers which are 
known to be associated with a given trait, in which case both copies of the biallelic marker present in 
individual's genome are determined so that an individual may be classified as homozygous or 
heterozygous for a particular allele. 

1 0 These genotyping methods can be performed on nucleic acid samples derived from a single 

individual or pooled DNA samples. 

Genotyping can be performed using similar methods as those described above for the 
identification of the biallelic markers, or using other genotyping methods such as those further 
described below. In preferred embodiments, the comparison of sequences of amplified genomic 

1 5 fragments from different individuals is used to identify new biallelic markers whereas 

microsequencing is used for genotyping known biallelic markers in diagnostic and association study 
applications. 

In one embodiment the invention encompasses methods of genotyping comprising 
determining the identity of a nucleotide at a AiP28-related biallelic marker or the complement 

20 thereof in a biological sample; In some embodiments, said ^j^S-related biallelic marker is selected 
from the group consisting of Al to A58, and the complements thereof, or the biallelic markers in 
linkage disequilibrium therewith; In some embodiments, wherein said BAP28-re\ated biallelic 
marker is selected from the group consisting of Al to A27, A34, A37 to A41, A43 to A49, A52, and 
A54 to A58, and the complements thereof, or the biallelic markers in linkage disequilibrium 

25 therewith; In some embodiments, said BAP28-reMed biallelic marker is selected from the group 
consisting of Al, A4, 16, A30, A3 1, A42, A50, A51, and A53, and the complements thereof, or the 
biallelic markers in linkage disequilibrium therewith; In some embodiments, said biological sample 
is derived from a single subject; In some embodiments, the identity of the nucleotides at said 
biallelic marker is determined for both copies of said biallelic marker present in said individual's 

30 genome; In some embodiments, said biological sample is derived from multiple subjects; In some 
embodiments, the method further comprises amplifying a portion of said sequence comprising the 
biallelic marker prior to said determining step; In some embodiments, said amplifying is performed 
by PCR; In some embodiments, said determining is performed by a hybridization assay, a 
sequencing assay, a microsequencing assay, or an enzyme-based mismatch detection assay. 
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Source of DNA for genotyping 

Any source of nucleic acids, in purified or non-purified form, can be utilized as the starting 
nucleic acid, provided it contains or is suspected of containing the specific nucleic acid sequence 
desired. DNA or RNA may be extracted from cells, tissues, body fluids and the like as described 
5 above. While nucleic acids for use in the genotyping methods of the invention can be derived from 
any mammalian source, the test subjects and individuals from which nucleic acid samples are taken 
are generally understood to be human. 

Amplification Of DNA Fragments Comprising Biallelic Markers 

Methods and polynucleotides are provided to amplify a segment of nucleotides comprising 
1 0 one or more biallelic marker of the present invention. It will be appreciated that amplification of 
DNA fragments comprising biallelic markers may be used in various methods and for various 
purposes and is not restricted to genotyping. Nevertheless, many genotyping methods, although not 
all, require the previous amplification of the DNA region carrying the biallelic marker of interest. 
Such methods specifically increase the concentration or total number of sequences that span the 
1 5 biallelic marker or include that site and sequences located either distal or proximal to it. Diagnostic 
assays may also rely on amplification of DNA segments carrying a biallelic marker of the present 
invention. Amplification of DNA may be achieved by any method known in the art. Amplification 
techniques are described above in the section entitled, "Amplification of the BAP28 gene". 

Some of these amplification methods are particularly suited for the detection of single 
20 nucleotide polymorphisms and allow the simultaneous amplification of a target sequence and the 
identification of the polymorphic nucleotide as it is further described below. 

The identification of biallelic markers as described above allows the design of appropriate 
oligonucleotides, which can be used as primers to amplify DNA fragments comprising the biallelic 
markers of the present invention. 
25 In some embodiments the present invention provides primers for amplifying a DNA 

fragment containing one or more biallelic markers of the present invention. 

The spacing of the primers determines the length of the segment to be amplified. In the 
context of the present invention, amplified segments carrying biallelic markers can range in size 
from at least about 25 bp to 35 kbp. Amplification fragments from 25-3000 bp are typical, 
30 fragments from 50-1000 bp are preferred and fragments from 100-600 bp are highly preferred. It 
will be appreciated that amplification primers for the biallelic markers may be any sequence which 
allow the specific amplification of any DNA fragment carrying the markers. Amplification primers 
may be labeled or immobilized on a solid support as described in "Oligonucleotide probes and 
primers". 
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Methods of Genotyping DNA samples for Biallelic Markers 

Any method known in the art can be used to identify the nucleotide present at a biallelic 
marker site. Since the biallelic marker allele to be detected has been identified and specified in the 
present invention, detection will prove simple for one of ordinary skill in the art by employing any 

5 of a number of techniques. Many genotyping methods require the previous amplification of the 
DNA region carrying the biallelic marker of interest. While the amplification of target or signal is 
often preferred at present, ultrasensitive detection methods which do not require amplification are 
also encompassed by the present genotyping methods. Methods well-known to those skilled in the 
art that can be used to detect biallelic polymorphisms include methods such as, conventional dot blot 

10 analyzes, single strand conformational polymorphism analysis (SSCP) described by Orita et 
al.(1989), denaturing gradient gel electrophoresis (DGGE), heteroduplex analysis, mismatch 
cleavage detection, and other conventional techniques as described in Sheffield et al.(1991), White 
et al.(1992), Grompe et al.(1989 and 1993). Another method for determining the identity of the 
nucleotide present at a particular polymorphic site employs a specialized exonuclease-resistant 

1 5 nucleotide derivative as described in US patent 4,65 6,127. 

Preferred methods involve directly determining the identity of the nucleotide present at a 
biallelic marker site by sequencing assay, enzyme-based mismatch detection assay, or hybridization 
assay. The following is a description of some preferred methods. A highly preferred method is the 
microsequencing technique. The term "sequencing" is used herein to refer to polymerase extension 

20 of duplex primer/template complexes and includes both traditional sequencing and microsequencing. 

1) Sequencing Assays 

The nucleotide present at a polymorphic site can be determined by sequencing methods. In 
a preferred embodiment, DNA samples are subjected to PCR amplification before sequencing as 
described above. DNA sequencing methods are described in "Sequencing Of Amplified Genomic 
25 DNA And Identification Of Single Nucleotide Polymorphisms". 

Preferably, the amplified DNA is subjected to automated dideoxy terminator sequencing 
reactions using a dye-primer cycle sequencing protocol. Sequence analysis allows the identification 
of the base present at the biallelic marker site. 

2) Microsequencing Assays 

30 In microsequencing methods, the nucleotide at a polymorphic site in a target DNA is 

detected by a single nucleotide primer extension reaction. This method involves appropriate 
microsequencing primers which, hybridize just upstream of the polymorphic base of interest in the 
target nucleic acid. A polymerase is used to specifically extend the 3' end of the primer with one 
single ddNTP (chain terminator) complementary to the nucleotide at the polymorphic site. Next the 
35 identity of the incorporated nucleotide is determined in any suitable way. 

Typically, microsequencing reactions are carried out using fluorescent ddNTPs and the 
extended microsequencing primers are analyzed by electrophoresis on ABI 377 sequencing 
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machines to determine the identity of the incorporated nucleotide as described in EP 412 883. 
Alternatively capillary electrophoresis can be used in order to process a higher number of assays 
simultaneously. An example of a typical microsequencing procedure that can be used in the context 
of the present invention is provided in Example 4. 
5 Different approaches can be used for the labeling and detection of ddNTPs. A 

homogeneous phase detection method based on fluorescence resonance energy transfer has been 
described by Chen and Kwok (1997) and Chen et al.(1997). In this method, amplified genomic 
DNA fragments containing polymorphic sites are incubated with a 5'-fluorescein-labeled primer in 
the presence of allelic dye-labeled dideoxyribonucleoside triphosphates and a modified Taq 
10 polymerase. The dye-labeled primer is extended one base by the dye-terminator specific for the 
allele present on the template. At the end of the genotyping reaction, the fluorescence intensities of 
the two dyes in the reaction mixture are analyzed directly without separation or purification. All 
these steps can be performed in the same tube and the fluorescence changes can be monitored in real 
time. Alternatively, the extended primer may be analyzed by MALDI-TOF Mass Spectrometry. 
15 The base at the polymorphic site is identified by the mass added onto the microsequencing primer 
(seeHaffandSmirnov, 1997). 

Microsequencing may be achieved by the established microsequencing method or by 
developments or derivatives thereof. Alternative methods include several solid-phase 
microsequencing techniques. The basic microsequencing protocol is the same as described 
20 previously, except that the method is conducted as a heterogeneous phase assay, in which the primer 
or the target molecule is immobilized or captured onto a solid support. To simplify the primer 
separation and the terminal nucleotide addition analysis, oligonucleotides are attached to solid 
supports or are modified in such ways that permit affinity separation as well as polymerase 
extension. The 5' ends and internal nucleotides of synthetic oligonucleotides can be modified in a 
25 number of different ways to permit different affinity separation approaches, e.g., biotinylation. If a 
single affinity group is used on the oligonucleotides, the oligonucleotides can be separated from the 
incorporated terminator regent. This eliminates the need of physical or size separation. More than 
one oligonucleotide can be separated from the terminator reagent and analyzed simultaneously if 
more than one affinity group is used. This permits the analysis of several nucleic acid species or 
30 more nucleic acid sequence information per extension reaction. The affinity group need not be on 
the priming oligonucleotide but could alternatively be present on the template. For example, 
immobilization can be carried out via an interaction between biotinylated DNA and streptavidin- 
coated microtitration wells or avidin-coated polystyrene particles. In the same manner, 
oligonucleotides or templates may be attached to a solid support in a high-density format. In such 
35 solid phase microsequencing reactions, incorporated ddNTPs can be radiolabeled (Syvanen, 1994) 
or linked to fluorescein (Livak and Hainer, 1994). The detection of radiolabeled ddNTPs can be 
achieved through scintillation-based techniques. The detection of fluorescein-linked ddNTPs can be 
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based on the binding of antifiuorescein antibody conjugated with alkaline phosphatase, followed by 
incubation with a chromogenic substrate (such as /7-nitrophenyl phosphate). Other possible reporter- 
detection pairs include: ddNTP linked to dinitrophenyl (DNP) and anti-DNP alkaline phosphatase 
conjugate (Harju et al., 1993) or biotinylated ddNTP and horseradish peroxidase-conjugated 
5 streptavidin with o-phenylenediamine as a substrate (WO 92/1 57 1 2). As yet another alternative 
solid-phase microsequencing procedure, Nyren et al.(1993) described a method relying on the 
detection of DNA polymerase activity by an enzymatic luminometric inorganic pyrophosphate 
detection assay (ELIDA). 

Pastinen et al.(1997) describe a method for multiplex detection of single nucleotide 
1 0 polymorphism in which the solid phase minisequencing principle is applied to an oligonucleotide 
array format. High-density arrays of DNA probes attached to a solid support (DNA chips) are 
further described below. 

In one aspect the present invention provides polynucleotides and methods to genotype one 
or more biallelic markers of the present invention by performing a microsequencing assay. Preferred 
15 microsequencing primers include the nucleotide sequences Dl to D58 and El to E58, preferably Dl 
to D27, D34, D37 to D41, D43 to D49, D52, D54 to D58, El to E27, E34, E37 to E41, E43 to E49, 
E52, and E54 to E58. It will be appreciated that the microsequencing primers listed in Example 4 are 
merely exemplary and that, any primer having a 3' end immediately adjacent to the polymorphic 
nucleotide may be used. Similarly, it will be appreciated that microsequencing analysis may be 
20 performed for any biallelic marker or any combination of biallelic markers of the present invention. 
One aspect of the present invention is a solid support which includes one or more microsequencing 
primers listed in Example 4, or fragments comprising at least 8, 12, 15, 20, 25, 30, 40, or 50 
consecutive nucleotides thereof and having a 3' terminus immediately upstream of the 
corresponding biallelic marker, for determining the identity of a nucleotide at a biallelic marker site. 
25 3^) Mismatch detection assays based on polymerases and ligases 

In one aspect the present invention provides polynucleotides and methods to determine the 
allele of one or more biallelic markers of the present invention in a biological sample, by mismatch 
detection assays based on polymerases and/or ligases. These assays are based on the specificity of 
polymerases and ligases. Polymerization reactions places particularly stringent requirements on 
30 correct base pairing of the 3' end of the amplification primer and the joining of two oligonucleotides 
hybridized to a target DNA sequence is quite sensitive to mismatches close to the ligation site, 
especially at the 3' end. Methods, primers and various parameters to amplify DNA fragments 
comprising biallelic markers of the present invention are further described above in "Amplification 
Of DNA Fragments Comprising Biallelic Markers". 

3 5 Allele Specific Amplification Primers 

Discrimination between the two alleles of a biallelic marker can also be achieved by allele 
specific amplification, a selective strategy, whereby one of the alleles is amplified without 
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amplification of the other allele. This is accomplished by placing the polymorphic base at the 3' end 
of one of the amplification primers. Because the extension forms from the 3'end of the primer, a 
mismatch at or near this position has an inhibitory effect on amplification. Therefore, under 
appropriate amplification conditions, these primers only direct amplification on their complementary 
5 allele. Determining the precise location of the mismatch and the corresponding assay conditions are 
well with the ordinary skill in the art. 

Ligation/Amplification Based Methods 

The "Oligonucleotide Ligation Assay" (OLA) uses two oligonucleotides which are 
designed to be capable of hybridizing to abutting sequences of a single strand of a target molecules. 
10 One of the oligonucleotides is biotinylated, and the other is detectably labeled. If the precise 

complementary sequence is found in a target molecule, the oligonucleotides will hybridize such that 
their termini abut, and create a ligation substrate that can be captured and detected. OLA is capable 
of detecting single nucleotide polymorphisms and may be advantageously combined with PCR as 
described by Nickerson et al.(1990). In this method, PCR is used to achieve the exponential 
1 5 amplification of target DNA, which is then detected using OLA. 

Other amplification methods which are particularly suited for the detection of single 
nucleotide polymorphism include LCR (ligase chain reaction), Gap LCR (GLCR) which are 
described above in "Amplification of the BAP28 gene". LCR uses two pairs of probes to 
exponentially amplify a specific target. The sequences of each pair of oligonucleotides, is selected 
20 to permit the pair to hybridize to abutting sequences of the same strand of the target. Such 

hybridization forms a substrate for a template-dependant ligase. In accordance with the present 
invention, LCR can be performed with oligonucleotides having the proximal and distal sequences of 
the same strand of a biallelic marker site. In one embodiment, either oligonucleotide will be 
designed to include the biallelic marker site. In such an embodiment, the reaction conditions are 
25 selected such that the oligonucleotides can be ligated together only if the target molecule either 
contains or lacks the specific nucleotide that is complementary to the biallelic marker on the 
oligonucleotide. In an alternative embodiment, the oligonucleotides will not include the biallelic 
marker, such that when they hybridize to the target molecule, a "gap" is created as described in WO 
90/01069. This gap is then "filled" with complementary dNTPs (as mediated by DNA polymerase), 
30 or by an additional pair of oligonucleotides. Thus at the end of each cycle, each single strand has a 
complement capable of serving as a target during the next cycle and exponential allele-specific 
amplification of the desired sequence is obtained. 

Ligase/Polymerase-mediated Genetic Bit Analysis™ is another method for determining the 
identity of a nucleotide at a preselected site in a nucleic acid molecule (WO 95/21271). This method 
3 5 involves the incorporation of a nucleoside triphosphate that is complementary to the nucleotide 
present at the preselected site onto the terminus of a primer molecule, and their subsequent ligation 
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to a second oligonucleotide. The reaction is monitored by detecting a specific label attached to the 
reaction's solid phase or by detection in solution. 
4) Hybridization Assay Methods 

A preferred method of determining the identity of the nucleotide present at a biallelic 
5 marker site involves nucleic acid hybridization. The hybridization probes, which can be 

conveniently used in such reactions, preferably include the probes defined herein. Any hybridization 
assay may be used including Southern hybridization, Northern hybridization, dot blot hybridization 
and solid-phase hybridization (see Sambrook et al., 1989). 

Hybridization refers to the formation of a duplex structure by two single stranded nucleic 
1 0 acids due to complementary base pairing. Hybridization can occur between exactly complementary 
nucleic acid strands or between nucleic acid strands that contain minor regions of mismatch. 
Specific probes can be designed that hybridize to one form of a biallelic marker and not to the other 
and therefore are able to discriminate between different allelic forms. Allele-specific probes are 
often used in pairs, one member of a pair showing perfect match to a target sequence containing the 
1 5 original allele and the other showing a perfect match to the target sequence containing the alternative 
allele. Hybridization conditions should be sufficiently stringent that there is a significant difference 
in hybridization intensity between alleles, and preferably an essentially binary response, whereby a 
probe hybridizes to only one of the alleles. Stringent, sequence specific hybridization conditions, 
under which a probe will hybridize only to the exactly complementary target sequence are well 
20 known in the art (Sambrook et al., 1989). Stringent conditions are sequence dependent and will be 
different in different circumstances. Generally, stringent conditions are selected to be about 5°C 
lower than the thermal melting point (Tm) for the specific sequence at a defined ionic strength and 
pH. Although such hybridizations can be performed in solution, it is preferred to employ a solid- 
phase hybridization assay. The target DNA comprising a biallelic marker of the present invention 
25 may be amplified prior to the hybridization reaction. The presence of a specific allele in the sample 
is determined by detecting the presence or the absence of stable hybrid duplexes formed between the 
probe and the target DNA. The detection of hybrid duplexes can be carried out by a number of 
methods. Various detection assay formats are well known which utilize detectable labels bound to 
either the target or the probe to enable detection of the hybrid duplexes. Typically, hybridization 
30 duplexes are separated from unhybridized nucleic acids and the labels bound to the duplexes are then 
detected. Those skilled in the art will recognize that wash steps may be employed to wash away 
excess target DNA or probe as well as unbound conjugate. Further, standard heterogeneous assay 
formats are suitable for detecting the hybrids using the labels present on the primers and probes. 

Two recently developed assays allow hybridization-based allele discrimination with no 
35 need for separations or washes (see Landegren U. et al, 1998). The TaqMan assay takes advantage 
of the 5' nuclease activity of Taq DNA polymerase to digest a DNA probe annealed specifically to 
the accumulating amplification product. TaqMan probes are labeled with a donor-acceptor dye pair 
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that interacts via fluorescence energy transfer. Cleavage of the TaqMan probe by the advancing 
polymerase during amplification dissociates the donor dye from the quenching acceptor dye, greatly 
increasing the donor fluorescence. All reagents necessary to detect two allelic variants can be 
assembled at the beginning of the reaction and the results are monitored in real time (see Livak et al., 
5 1 995). In an alternative homogeneous hybridization based procedure, molecular beacons are used 
for allele discriminations. Molecular beacons are hairpin-shaped oligonucleotide probes that report 
the presence of specific nucleic acids in homogeneous solutions. When they bind to their targets 
they undergo a conformational reorganization that restores the fluorescence of an internally 
quenched fluorophore (Tyagi et al., 1998). 
1 o The polynucleotides provided herein can be used to produce probes which can be used in 

hybridization assays for the detection of biallelic marker alleles in biological samples. These probes 
are characterized in that they preferably comprise between 8 and 50 nucleotides, and in that they are 
sufficiently complementary to a sequence comprising a biallelic marker of the present invention to 
hybridize thereto and preferably sufficiently specific to be able to discriminate the targeted sequence 
15 for only one nucleotide variation. A particularly preferred probe is 25 nucleotides in length. 

Preferably the biallelic marker is within 4 nucleotides of the center of the polynucleotide probe. In 
particularly preferred probes, the biallelic marker is at the center of said polynucleotide. Preferred 
probes comprise a nucleotide sequence selected from the group consisting of amplicons listed in 
Table 1 and the sequences complementary thereto, or a fragment thereof, said fragment comprising 
20 at least about 8 consecutive nucleotides, preferably 10, 15, 20, more preferably 25, 30, 40, 47, or 50 
consecutive nucleotides and containing a polymorphic base. Preferred probes comprise a nucleotide 
sequence selected from the group consisting of PI to P58 and the sequences complementary thereto, 
preferably PI to P27, P34, P37 to P41, P43 to P49, P52, P54 to P58. In preferred embodiments the 
polymorphic base(s) are within 5, 4, 3, 2, 1, nucleotides of the center of the said polynucleotide, 
25 more preferably at the center of said polynucleotide. 

Preferably the probes of the present invention are labeled or immobilized on a solid 
support. Labels and solid supports are further described in "Oligonucleotide Probes and Primers". 
The probes can be non-extendable as described in "Oligonucleotide Probes and Primers". 

By assaying the hybridization to an allele specific probe, one can detect the presence or 
30 absence of a biallelic marker allele in a given sample. High-Throughput parallel hybridizations in 
array format are specifically encompassed within "hybridization assays" and are described below. 
51 Hybridization To Addressable Arrays Of Oligonucleotides 
Hybridization assays based on oligonucleotide arrays rely on the differences in 
hybridization stability of short oligonucleotides to perfectly matched and mismatched target 
35 sequence variants. Efficient access to polymorphism information is obtained through a basic 

structure comprising high-density arrays of oligonucleotide probes attached to a solid support (e.g., 
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the chip) at selected positions. Each DNA chip can contain thousands to millions of individual 
synthetic DNA probes arranged in a grid-like pattern and miniaturized to the size of a dime. 

The chip technology has already been applied with success in numerous cases. For 
example, the screening of mutations has been undertaken in the BRCA1 gene, in S. cerevisiae 
5 mutant strains, and in the protease gene of fflV-1 virus (Hacia et al., 1996; Shoemaker et al., 1996; 
Kozal et al., 1996). Chips of various formats for use in detecting biallelic polymorphisms can be 
produced on a customized basis by Affymetrix (GeneChip™), Hyseq (HyChip and HyGnostics), and 
Protogene Laboratories. 

In general, these methods employ arrays of oligonucleotide probes that are complementary 
1 0 to target nucleic acid sequence segments from an individual which, target sequences include a 
polymorphic marker. EP 785280 describes a tiling strategy for the detection of single nucleotide 
polymorphisms. Briefly, arrays may generally be "tiled" for a large number of specific 
polymorphisms. By "tiling" is generally meant the synthesis of a defined set of oligonucleotide 
probes which is made up of a sequence complementary to the target sequence of interest, as well as 
1 5 preselected variations of that sequence, e.g., substitution of one or more given positions with one or 
more members of the basis set of monomers, i.e. nucleotides. Tiling strategies are further described 
in PCT application No WO 95/1 1995. In a particular aspect, arrays are tiled for a number of 
specific, identified biallelic marker sequences. In particular, the array is tiled to include a number of 
detection blocks, each detection block being specific for a specific biallelic marker or a set of 
20 biallelic markers. For example, a detection block may be tiled to include a number of probes, which 
span the sequence segment that includes a specific polymorphism. To ensure probes that are 
complementary to each allele, the probes are synthesized in pairs differing at the biallelic marker. In 
addition to the probes differing at the polymorphic base, monosubstituted probes are also generally 
tiled within the detection block. These monosubstituted probes have bases at and up to a certain 
25 number of bases in either direction from the polymorphism, substituted with the remaining 

nucleotides (selected from A, T, G, C and U). Typically the probes in a tiled detection block will 
include substitutions of the sequence positions up to and including those that are 5 bases away from 
the biallelic marker. The monosubstituted probes provide internal controls for the tiled array, to 
distinguish actual hybridization from artefactual cross-hybridization. Upon completion of 
30 hybridization with the target sequence and washing of the array, the array is scanned to determine 
the position on the array to which the target sequence hybridizes. The hybridization data from the 
scanned array is then analyzed to identify which allele or alleles of the biallelic marker are present in 
the sample. Hybridization and scanning may be carried out as described in PCT application No WO 
92/1 0092 and WO 95/1 1995 and US patent No 5,424,1 86. 
35 Thus, in some embodiments, the chips may comprise an array of nucleic acid sequences of 

fragments of about 15 nucleotides in length. In further embodiments, the chip may comprise an 
array including at least one sequences comprising at least about 8 consecutive nucleotides, 



GENSET.063 AUS r ^ E1> 

preferably 10, 15, 20, more preferably 25, 30, 40, 47, or 50 consecutive nucleotides and containing a 
polymorphic base. In preferred embodiments the polymorphic base is within 5, 4, 3, 2, 1, 
nucleotides of the center of the said polynucleotide, more preferably at the center of said 
polynucleotide. In some embodiments, the chip may comprise an array of at least 2, 3, 4, 5, 6, 7, 8 
5 or more of these polynucleotides of the invention. Solid supports and polynucleotides of the present 
invention attached to solid supports are further described in "oligonucleotide probes and primers". 
6) Integrated Systems 

Another technique, which may be used to analyze polymorphisms, includes 
multicomponent integrated systems, which miniaturize and compartmentalize processes such as 
10 PCR and capillary electrophoresis reactions in a single functional device. An example of such 

technique is disclosed in US patent 5,589,136, which describes the integration of PCR amplification 
and capillary electrophoresis in chips. 

Integrated systems can be envisaged mainly when microfluidic systems are used. These 
systems comprise a pattern of microchannels designed onto a glass, silicon, quartz, or plastic wafer 
1 5 included on a microchip. The movements of the samples are controlled by electric, electroosmotic 
or hydrostatic forces applied across different areas of the microchip to create functional microscopic 
valves and pumps with no moving parts. 

For genotyping biallelic markers, the microfluidic system may integrate nucleic acid 
amplification, microsequencing, capillary electrophoresis and a detection method such as laser- 
20 induced fluorescence detection. 

Methods Of Genetic Analysis Using The Biallelic Markers Of The Present Invention 
Different methods are available for the genetic analysis of complex traits (see Lander and 
Schork, 1994). The search for disease-susceptibility genes is conducted using two main methods: 
the linkage approach in which evidence is sought for cosegregation between a locus and a putative 
25 trait locus using family studies, and the association approach in which evidence is sought for a 
statistically significant association between an allele and a trait or a trait causing allele (Khoury et 
al, 1993). In general, the biallelic markers of the present invention find use in any method known in 
the art to demonstrate a statistically significant correlation between a genotype and a phenotype. 
The biallelic markers may be used in parametric and non-parametric linkage analysis methods. 
30 Preferably, the biallelic markers of the present invention are used to identify genes associated with 
detectable traits using association studies, an approach which does not require the use of affected 
families and which permits the identification of genes associated with complex and sporadic traits. 

The genetic analysis using the biallelic markers of the present invention may be conducted 
on any scale. The whole set of biallelic markers of the present invention or any subset of biallelic 
35 markers of the present invention corresponding to the candidate gene may be used. Further, any set 
of genetic markers including a biallelic marker of the present invention may be used. A set of 
biallelic polymorphisms that could be used as genetic markers in combination with the biallelic 
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markers of the present invention has been described in WO 98/20165. As mentioned above, it 
should be noted that the biallelic markers of the present invention may be included in any complete 
or partial genetic map of the human genome. These different uses are specifically contemplated in 
the present invention and claims. 

5 Linkage Analysis 

Linkage analysis is based upon establishing a correlation between the transmission of 
genetic markers and that of a specific trait throughout generations within a family. Thus, the aim of 
linkage analysis is to detect marker loci that show cosegregation with a trait of interest in pedigrees. 

Parametric Methods 

10 When data are available from successive generations there is the opportunity to study the 

degree of linkage between pairs of loci. Estimates of the recombination fraction enable loci to be 
ordered and placed onto a genetic map. With loci that are genetic markers, a genetic map can be 
established, and then the strength of linkage between markers and traits can be calculated and used 
to indicate the relative positions of markers and genes affecting those traits (Weir, 1996). The 
15 classical method for linkage analysis is the logarithm of odds (lod) score method (see Morton, 1955; 
Ott, 1991). Calculation of lod scores requires specification of the mode of inheritance for the 
disease (parametric method). Generally, the length of the candidate region identified using linkage 
analysis is between 2 and 20Mb. Once a candidate region is identified as described above, analysis 
of recombinant individuals using additional markers allows further delineation of the candidate 
20 region. Linkage analysis studies have generally relied on the use of a maximum of 5,000 
microsatellite markers, thus limiting the maximum theoretical attainable resolution of linkage 
analysis to about 600 kb on average. 

Linkage analysis has been successfully applied to map simple genetic traits that show clear 
Mendelian inheritance patterns and which have a high penetrance (i.e., the ratio between the number 
25 of trait positive carriers of allele a and the total number of a carriers in the population). However, 
parametric linkage analysis suffers from a variety of drawbacks. First, it is limited by its reliance on 
the choice of a genetic model suitable for each studied trait. Furthermore, as already mentioned, the 
resolution attainable using linkage analysis is limited, and complementary studies are required to 
refine the analysis of the typical 2Mb to 20Mb regions initially identified through linkage analysis. 
30 In addition, parametric linkage analysis approaches have proven difficult when applied to complex 
genetic traits, such as those due to the combined action of multiple genes and/or environmental 
factors. It is very difficult to model these factors adequately in a lod score analysis. In such cases, 
too large an effort and cost are needed to recruit the adequate number of affected families required 
for applying linkage analysis to these situations, as recently discussed by Risch, N. and Merikangas, 
35 K.(1996). 
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Non-Parametric Methods 

The advantage of the so-called non-parametric methods for linkage analysis is that they do 
not require specification of the mode of inheritance for the disease, they tend to be more useful for 
the analysis of complex traits. In non-parametric methods, one tries to prove that the inheritance 
5 pattern of a chromosomal region is not consistent with random Mendelian segregation by showing 
that affected relatives inherit identical copies of the region more often than expected by chance. 
Affected relatives should show excess "allele sharing" even in the presence of incomplete 
penetrance and polygenic inheritance. In non-parametric linkage analysis the degree of agreement at 
a marker locus in two individuals can be measured either by the number of alleles identical by state 
10 (IBS) or by the number of alleles identical by descent (IBD). Affected sib pair analysis is a well- 
known special case and is the simplest form of these methods. 

The biallelic markers of the present invention may be used in both parametric and non- 
parametric linkage analysis. Preferably biallelic markers may be used in non-parametric methods 
which allow the mapping of genes involved in complex traits. The biallelic markers of the present 
1 5 invention may be used in both IBD- and IBS- methods to map genes affecting a complex trait. In 
such studies, taking advantage of the high density of biallelic markers, several adjacent biallelic 
marker loci may be pooled to achieve the efficiency attained by multi-allelic markers (Zhao et al., 
1998). 

Population Association Studies 

20 The present invention comprises methods for identifying if the BAP28 gene is associated 

with a detectable trait using the biallelic markers of the present invention. In one embodiment the 
present invention comprises methods to detect an association between a biallelic marker allele or a 
biallelic marker haplotype and a trait. Further, the invention comprises methods to identify a trait 
causing allele in linkage disequilibrium with any biallelic marker allele of the present invention. 
25 As described above, alternative approaches can be employed to perform association 

studies: genome-wide association studies, candidate region association studies and candidate gene 
association studies. In a preferred embodiment, the biallelic markers of the present invention are 
used to perform candidate gene association studies. The candidate gene analysis clearly provides a 
short-cut approach to the identification of genes and gene polymorphisms related to a particular trait 
30 when some information concerning the biology of the trait is available. Further, the biallelic 

markers of the present invention may be incorporated in any map of genetic markers of the human 
genome in order to perform genome-wide association studies. Methods to generate a high-density 
map of biallelic markers has been described in US Provisional Patent application serial number 
60/082,614. The biallelic markers of the present invention may further be incorporated in any map 
35 of a specific candidate region of the genome (a specific chromosome or a specific chromosomal 
segment for example). 
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As mentioned above, association studies may be conducted within the general population 
and are not limited to studies performed on related individuals in affected families. Association 
studies are extremely valuable as they permit the analysis of sporadic or multifactor traits. 
Moreover, association studies represent a powerful method for fine-scale mapping enabling much 

5 finer mapping of trait causing alleles than linkage studies. Studies based on pedigrees often only 
narrow the location of the trait causing allele. Association studies using the biallelic markers of the 
present invention can therefore be used to refine the location of a trait causing allele in a candidate 
region identified by Linkage Analysis methods. Moreover, once a chromosome segment of interest 
has been identified, the presence of a candidate gene such as a candidate gene of the present 

10 invention, in the region of interest can provide a shortcut to the identification of the trait causing 
allele. Biallelic markers of the present invention can be used to demonstrate that a candidate gene is 
associated with a trait. Such uses are specifically contemplated in the present invention. 

Determining The Frequency Of A Biallelic Marker Allele Or Of A Biallelic Marker 
Haplotype In A Population 

1 5 Association studies explore the relationships among frequencies for sets of alleles between 

loci. 

Determining The Frequency Of An Allele In A Population 

Allelic frequencies of the biallelic markers in a populations can be determined using one of 
the methods described above under the heading "Methods for genotyping an individual for biallelic 
20 markers", or any genotyping procedure suitable for this intended purpose. Genotyping pooled 
samples or individual samples can determine the frequency of a biallelic marker allele in a 
population. One way to reduce the number of genotypings required is to use pooled samples. A 
major obstacle in using pooled samples is in terms of accuracy and reproducibility for determining 
accurate DNA concentrations in setting up the pools. Genotyping individual samples provides 
25 higher sensitivity, reproducibility and accuracy and; is the preferred method used in the present 
invention. Preferably, each individual is genotyped separately and simple gene counting is applied 
to determine the frequency of an allele of a biallelic marker or of a genotype in a given population. 

The invention also relates to methods of estimating the frequency of an allele in a 
population comprising: a) genotyping individuals from said population for said biallelic marker 
30 according to the method of the present invention; b) determining the proportional representation of 
said biallelic marker in said population. In addition, the methods of estimating the frequency of an 
allele in a population of the invention encompass methods with any further limitation described in 
this disclosure, or those following, specified alone or in any combination; In some embodiments, 
said ^^S-related biallelic marker is selected from the group consisting of Al to A58, and the 
35 complements thereof, or the biallelic markers in linkage disequilibrium therewith; In some 

embodiments, said BAP28-re\ated biallelic marker is selected from the group consisting of A 1 to 
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A27, A34, A3 7 to A41, A43 to A49, A52, and A54 to A58, and the complements thereof, or the 
biallelic markers in linkage disequilibrium therewith; In some embodiments, said BAP28-related 
biallelic marker is selected from the group consisting of Al, A4, 16, A30, A3 1, A42, A50, A51, and 
A53, and the complements thereof, or the biallelic markers in linkage disequilibrium therewith; In 
5 some embodiments, the step of determining the frequency of a biallelic marker allele in a population 
may be accomplished by determining the identity of the nucleotides for both copies of said biallelic 
marker present in the genome of each individual in said population and calculating the proportional 
representation of said nucleotide at said iL4P2S-related biallelic marker for the population; In some 
embodiments, the step of determining the proportional representation may be accomplished by 
10 performing a genotyping method of the invention on a pooled biological sample derived from a 
representative number of individuals, or each individual, in said population, and calculating the 
proportional amount of said nucleotide compared with the total. 

Determining The Frequency Of A Haplotype In A Population 

The gametic phase of haplotypes is unknown when diploid individuals are heterozygous at 

15 more than one locus. Using genealogical information in families gametic phase can sometimes be 
inferred (Perlin et al., 1994). When no genealogical information is available different strategies may 
be used. One possibility is that the multiple-site heterozygous diploids can be eliminated from the 
analysis, keeping only the homozygotes and the single-site heterozygote individuals, but this 
approach might lead to a possible bias in the sample composition and the underestimation of low- 

20 frequency haplotypes. Another possibility is that single chromosomes can be studied independently, 
for example, by asymmetric PCR amplification (see Newton et al, 1989; Wu et al., 1989) or by 
isolation of single chromosome by limit dilution followed by PCR amplification (see Ruano et al., 
1990). Further, a sample may be haplotyped for sufficiently close biallelic markers by double PCR 
amplification of specific alleles (Sarkar, G. andSommerS. S., 1991). These approaches are not 

25 entirely satisfying either because of their technical complexity, the additional cost they entail, their 
lack of generalization at a large scale, or the possible biases they introduce. To overcome these 
difficulties, an algorithm to infer the phase of PCR-amplified DNA genotypes introduced by Clark, 
A.G.(1990) may be used. Briefly, the principle is to start filling a preliminary list of haplotypes 
present in the sample by examining unambiguous individuals, that is, the complete homozygotes and 

30 the single-site heterozygotes. Then other individuals in the same sample are screened for the 
possible occurrence of previously recognized haplotypes. For each positive identification, the 
complementary haplotype is added to the list of recognized haplotypes, until the phase information 
for all individuals is either resolved or identified as unresolved. This method assigns a single 
haplotype to each multiheterozygous individual, whereas several haplotypes are possible when there 

35 are more than one heterozygous site. Alternatively, one can use methods estimating haplotype 
frequencies in a population without assigning haplotypes to each individual. Preferably, a method 
based on an expectation-maximization (EM) algorithm (Dempster et al., 1977) leading to maximum- 
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likelihood estimates of haplotype frequencies under the assumption of Hardy- Weinberg proportions 
(random mating) is used (see Excoffier L. and Slatkin M., 1995). The EM algorithm is a generalized 
iterative maximum-likelihood approach to estimation that is useful when data are ambiguous and/or 
incomplete. The EM algorithm is used to resolve heterozygotes into haplotypes. Haplotype 
5 estimations are further described below under the heading "Statistical Methods." Any other method 
known in the art to determine or to estimate the frequency of a haplotype in a population may be 
used. 

The invention also encompasses methods of estimating the frequency of a haplotype for a 
set of biallelic markers in a population, comprising the steps of: a) genotyping at least one BAP28- 

10 related biallelic marker according to a method of the invention for each individual in said 

population; b) genotyping a second biallelic marker by determining the identity of the nucleotides at 
said second biallelic marker for both copies of said second biallelic marker present in the genome of 
each individual in said population; and c) applying a haplotype determination method to the 
identities of the nucleotides determined in steps a) and b) to obtain an estimate of said frequency. In 

1 5 addition, the methods of estimating the frequency of a haplotype of the invention encompass 

methods with any further limitation described in this disclosure, or those following, specified alone 
or in any combination: In some embodiments, said BAP28-re\ated biallelic marker is selected from 
the group consisting of Al to A58, and the complements thereof, or the biallelic markers in linkage 
disequilibrium therewith; In some embodiments, said BAP28-re\ated biallelic marker is selected 

20 from the group consisting of Al to A27, A34, A37 to A41, A43 to A49, A52, and A54 to A58, and 
the complements thereof, or the biallelic markers in linkage disequilibrium therewith; In some 
embodiments, said BAP28-related biallelic marker is selected from the group consisting of Al, A4, 
16, A30, A31, A42, A50, A51, and A53, and the complements thereof, or the biallelic markers in 
linkage disequilibrium therewith; In some embodiments, said haplotype determination method is 

25 performed by asymmetric PCR amplification, double PCR amplification of specific alleles, the Clark 
algorithm, or an expectation-maximization algorithm. 

Linkage Disequilibrium Analysis 

Linkage disequilibrium is the non-random association of alleles at two or more loci and 
represents a powerful tool for mapping genes involved in disease traits (see Ajioka R.S. et al., 1997). 
30 Biallelic markers, because they are densely spaced in the human genome and can be genotyped in 
greater numbers than other types of genetic markers (such as RFLP or VNTR markers), are 
particularly useful in genetic analysis based on linkage disequilibrium. 

When a disease mutation is first introduced into a population (by a new mutation or the 
immigration of a mutation carrier), it necessarily resides on a single chromosome and thus on a 
35 single "background" or "ancestral" haplotype of linked markers. Consequently, there is complete 
disequilibrium between these markers and the disease mutation: one finds the disease mutation only 
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in the presence of a specific set of marker alleles. Through subsequent generations recombination 
events occur between the disease mutation and these marker polymorphisms, and the disequilibrium 
gradually dissipates. The pace of this dissipation is a function of the recombination frequency, so 
the markers closest to the disease gene will manifest higher levels of disequilibrium than those that 
5 are further away. When not broken up by recombination, "ancestral" haplotypes and linkage 

disequilibrium between marker alleles at different loci can be tracked not only through pedigrees but 
also through populations. Linkage disequilibrium is usually seen as an association between one 
specific allele at one locus and another specific allele at a second locus. 

The pattern or curve of disequilibrium between disease and marker loci is expected to 
1 0 exhibit a maximum that occurs at the disease locus. Consequently, the amount of linkage 
disequilibrium between a disease allele and closely linked genetic markers may yield valuable 
information regarding the location of the disease gene. For fine-scale mapping of a disease locus, it 
is useful to have some knowledge of the patterns of linkage disequilibrium that exist between 
markers in the studied region. As mentioned above the mapping resolution achieved through the 
1 5 analysis of linkage disequilibrium is much higher than that of linkage studies. The high density of 
biallelic markers combined with linkage disequilibrium analysis provides powerful tools for fine- 
scale mapping. Different methods to calculate linkage disequilibrium are described below under the 
heading "Statistical Methods". 

Population-Based Case-Control Studies Of Trait-Marker Associations 
20 As mentioned above, the occurrence of pairs of specific alleles at different loci on the same 

chromosome is not random and the deviation from random is called linkage disequilibrium. 
Association studies focus on population frequencies and rely on the phenomenon of linkage 
disequilibrium. If a specific allele in a given gene is directly involved in causing a particular trait, its 
frequency will be statistically increased in an affected (trait positive) population, when compared to 
25 the frequency in a trait negative population or in a random control population. As a consequence of 
the existence of linkage disequilibrium, the frequency of all other alleles present in the haplotype 
carrying the trait-causing allele will also be increased in trait positive individuals compared to trait 
negative individuals or random controls. Therefore, association between the trait and any allele 
(specifically a biallelic marker allele) in linkage disequilibrium with the trait-causing allele will 
30 suffice to suggest the presence of a trait-related gene in that particular region. Case-control 

populations can be genotyped for biallelic markers to identify associations that narrowly locate a 
trait causing allele. As any marker in linkage disequilibrium with one given marker associated with 
a trait will be associated with the trait. Linkage disequilibrium allows the relative frequencies in 
case-control populations of a limited number of genetic polymorphisms (specifically biallelic 
35 markers) to be analyzed as an alternative to screening all possible functional polymorphisms in order 
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to find trait-causing alleles. Association studies compare the frequency of marker alleles in 
unrelated case-control populations, and represent powerful tools for the dissection of complex traits. 
Case-Control Populations (Inclusion Criteria) 

Population-based association studies do not concern familial inheritance but compare the 
5 prevalence of a particular genetic marker, or a set of markers, in case-control populations. They are 
case-control studies based on comparison of unrelated case (affected or trait positive) individuals 
and unrelated control (unaffected, trait negative or random) individuals. Preferably the control 
group is composed of unaffected or trait negative individuals. Further, the control group is 
ethnically matched to the case population. Moreover, the control group is preferably matched to the 
1 0 case-population for the main known confusion factor for the trait under study (for example age- 
matched for an age-dependent trait). Ideally, individuals in the two samples are paired in such a way 
that they are expected to differ only in their disease status. The terms "trait positive population", 
"case population" and "affected population" are used interchangeably herein. 

An important step in the dissection of complex traits using association studies is the choice 
15 of case-control populations (see Lander and Schork, 1994). A major step in the choice of case- 
control populations is the clinical definition of a given trait or phenotype. Any genetic trait may be 
analyzed by the association method proposed here by carefully selecting the individuals to be 
included in the trait positive and trait negative phenotypic groups. Four criteria are often useful: 
clinical phenotype, age at onset, family history and severity. The selection procedure for continuous 
20 or quantitative traits (such as blood pressure for example) involves selecting individuals at opposite 
ends of the phenotype distribution of the trait under study, so as to include in these trait positive and 
trait negative populations individuals with non-overlapping phenotypes. Preferably, case-control 
populations are phenotypically homogeneous populations. Trait positive and trait negative 
populations consist of phenotypically uniform populations of individuals representing each between 
25 1 and 98%, preferably between 1 and 80%, more preferably between 1 and 50%, and more 

preferably between 1 and 30%, most preferably between 1 and 20% of the total population under 
study, and preferably selected among individuals exhibiting non-overlapping phenotypes. The 
clearer the difference between the two trait phenotypes, the greater the probability of detecting an 
association with biallelic markers. The selection of those drastically different but relatively uniform 
30 phenotypes enables efficient comparisons in association studies and the possible detection of marked 
differences at the genetic level, provided that the sample sizes of the populations under study are 
significant enough. 

In preferred embodiments, a first group of between 50 and 300 trait positive individuals, 
preferably about 100 individuals, are recruited according to their phenotypes. A similar number of 
35 control individuals are included in such studies. 

In the present invention, typical examples of inclusion criteria include, but are not 
restricted to, prostate cancer or aggressiveness of prostate cancer tumors. In one preferred 
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embodiment of the present invention, association studies are carried out on the basis of a presence 
(trait positive) or absence (trait negative) of prostate cancer. 

Associations studies can be carried out by the skilled technician using the biallelic markers 
of the invention defined above, with different trait positive and trait negative populations. Suitable 
5 further examples of association studies using biallelic markers of the BAP 28 gene, including the 
biallelic markers Al to A58, preferably Al to A27, A34, A37 to A41, A43 to A49, A52, and A54 to 
A58, involve studies on the following populations: 

- a trait positive population suffering from a cancer and a healthy unaffected population, or 

- a trait positive population suffering from prostate cancer treated with agents acting 

1 0 against prostate cancer and suffering from side-effects resulting from this treatment and an trait 
negative population suffering from prostate cancer treated with same agents without any substantial 
side-effects, or 

- a trait positive population suffering from prostate cancer treated with agents acting 
against prostate cancer showing a beneficial response and a trait negative population suffering from 

1 5 prostate cancer treated with same agents without any beneficial response, or 

- a trait positive population suffering from prostate cancer presenting highly aggressive 
prostate cancer tumors and a trait negative population suffering from prostate cancer with prostate 
cancer tumors devoid of aggressiveness. 

Association Analysis 

20 The invention also comprises methods of detecting an association between a genotype and 

a phenotype, comprising the steps of: a) determining the frequency of at least one BAP28-ve\atQd 
biallelic marker in a trait positive population according to a genotyping method of the invention; b) 
determining the frequency of said BAP28-related biallelic marker in a control population according 
to a genotyping method of the invention; and c) determining whether a statistically significant 

25 association exists between said genotype and said phenotype. In addition, the methods of detecting 
an association between a genotype and a phenotype of the invention encompass methods with any 
further limitation described in this disclosure, or those following, specified alone or in any 
combination: In some embodiments, said 5^4P28-related biallelic marker is selected from the group 
consisting of A 1 to A58, and the complements thereof, or the biallelic markers in linkage 

30 disequilibrium therewith; In some embodiments, said BAP 2 8-rela.ted biallelic marker is selected 
from the group consisting of Al to A27, A34, A37 to A41, A43 to A49, A52, and A54 to A58, and 
the complements thereof, or the biallelic markers in linkage disequilibrium therewith; In some 
embodiments, said BAP28-re\aXed biallelic marker is selected from the group consisting of Al, A4, 
16, A30, A31, A42, A50, A51, and A53, and the complements thereof, or the biallelic markers in 

35 linkage disequilibrium therewith; In some embodiments, said control population may be a trait 

negative population, or a random population; In some embodiments, each of said genotyping steps a) 
and b) may be performed on a pooled biological sample derived from each of said populations; In 
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some embodiments, each of said genotyping of steps a) and b) is performed separately on biological 
samples derived from each individual in said population or a subsample thereof. 

The general strategy to perform association studies using biallelic markers derived from a 
region carrying a candidate gene is to scan two groups of individuals (case-control populations) in 
5 order to measure and statistically compare the allele frequencies of the biallelic markers of the 
present invention in both groups. 

If a statistically significant association with a trait is identified for at least one or more of 
the analyzed biallelic markers, one can assume that: either the associated allele is directly 
responsible for causing the trait (i.e. the associated allele is the trait causing allele), or more likely 

10 the associated allele is in linkage disequilibrium with the trait causing allele. The specific 
characteristics of the associated allele with respect to the candidate gene function usually give 
further insight into the relationship between the associated allele and the trait (causal or in linkage 
disequilibrium). If the evidence indicates that the associated allele within the candidate gene is most 
probably not the trait causing allele but is in linkage disequilibrium with the real trait causing allele, 

15 then the trait causing allele can be found by sequencing the vicinity of the associated marker, and 
performing further association studies with the polymorphisms that are revealed in an iterative 
manner. 

Association studies are usually run in two successive steps. In a first phase, the 
frequencies of a reduced number of biallelic markers from the candidate gene are determined in the 
20 trait positive and control populations. In a second phase of the analysis, the position of the genetic 
loci responsible for the given trait is further refined using a higher density of markers from the 
relevant region. However, if the candidate gene under study is relatively small in length, as is the 
case for BAP28, a single phase may be sufficient to establish significant associations. 

Haplotype Analysis 

25 As described above, when a chromosome carrying a disease allele first appears in a 

population as a result of either mutation or migration, the mutant allele necessarily resides on a 
chromosome having a set of linked markers: the ancestral haplotype. This haplotype can be tracked 
through populations and its statistical association with a given trait can be analyzed. 
Complementing single point (allelic) association studies with multi-point association studies also 

30 called haplotype studies increases the statistical power of association studies. Thus, a haplotype 
association study allows one to define the frequency and the type of the ancestral carrier haplotype. 
A haplotype analysis is important in that it increases the statistical power of an analysis involving 
individual markers. 

In a first stage of a haplotype frequency analysis, the frequency of the possible haplotypes 
35 based on various combinations of the identified biallelic markers of the invention is determined. 
The haplotype frequency is then compared for distinct populations of trait positive and control 
individuals. The number of trait positive individuals, which should be, subjected to this analysis to 
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obtain statistically significant results usually ranges between 30 and 300, with a preferred number of 
individuals ranging between 50 and 150. The same considerations apply to the number of 
unaffected individuals (or random control) used in the study. The results of this first analysis 
provide haplotype frequencies in case-control populations, for each evaluated haplotype frequency a 
5 p-value and an odd ratio are calculated. If a statistically significant association is found the relative 
risk for an individual carrying the given haplotype of being affected with the trait under study can be 
approximated. 

An additional embodiment of the present invention encompasses methods of detecting an 
association between a haplotype and a phenotype, comprising the steps of: a) estimating the 

1 0 frequency of at least one haplotype in a trait positive population, according to a method of the 

invention for estimating the frequency of a haplotype; b) estimating the frequency of said haplotype 
in a control population, according to a method of the invention for estimating the frequency of a 
haplotype; and c) determining whether a statistically significant association exists between said 
haplotype and said phenotype. In addition, the methods of detecting an association between a 

15 haplotype and a phenotype of the invention encompass methods with any further limitation 

described in this disclosure, or those following: In some embodiments, said BAP28-K\ated biallelic 
marker is selected from the group consisting of A 1 to A58, and the complements thereof, or the 
biallelic markers in linkage disequilibrium therewith; In some embodiments, said BAP28-rela.ted 
biallelic marker is selected from the group consisting of Al to A27, A34, A37 to A41, A43 to A49, 

20 A52, and A54 to A58, and the complements thereof, or the biallelic markers in linkage 

disequilibrium therewith; In some embodiments, said BAP28-related biallelic marker is selected 
from the group consisting of Al, A4, 16, A30, A3 1, A42, A50, A5 1, and A53, and the complements 
thereof, or the biallelic markers in linkage disequilibrium therewith; In some embodiments, said 
control population is a trait negative population, or a random population. In some embodiments, 

25 said method comprises the additional steps of determining the phenotype in said trait positive and 
said control populations prior to step c). 
Interaction Analysis 

The biallelic markers of the present invention may also be used to identify patterns of 
biallelic markers associated with detectable traits resulting from polygenic interactions. The analysis 
30 of genetic interaction between alleles at unlinked loci requires individual genotyping using the 
techniques described herein. The analysis of allelic interaction among a selected set of biallelic 
markers with appropriate level of statistical significance can be considered as a haplotype analysis. 
Interaction analysis consists in stratifying the case-control populations with respect to a given 
haplotype for the first loci and performing a haplotype analysis with the second loci with each 
35 subpopulation. 

Statistical methods used in association studies are further described below. 
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Testing For Linkage In The Presence Of Association 

The biallelic markers of the present invention may further be used in TDT 
(transmission/disequilibrium test). TDT tests for both linkage and association and is not affected by 
population stratification. TDT requires data for affected individuals and their parents or data from 
5 unaffected sibs instead of from parents (see Spielmann S. et al., 1993; Schaid D.J. et al., 1996, 
Spielmann S. and Ewens W.J., 1998). Such combined tests generally reduce the false - positive 
errors produced by separate analyses. 

Association OF Biallelic Markers Of BAP28 With Prostate Cancer 

Trait Positive And Control Populations 

1 0 Two groups of independent individuals were used: the overall trait positive and the control 

populations included 491 individuals suffering from prostate cancer and 3 13 individuals without any 
sign of prostate cancer. A specific protocol for the collection of DNA samples from trait positive 
and control individuals is described in Example 5. The 491 affected individuals can be subdivided in 
197 familial cases and 294 sporadic cases. The sporadic cases comprises 70 sporadic informatives 

15 cases. The 491 individuals suffering from prostate cancer can also be subdivided into a population of 
individuals who developed prostate cancer under 65 years-old and a population of individuals who 
developed prostate cancer after the age of 65. 

In order to have as much certainty as possible on the absence of prostate cancer in control 
individuals, it is preferred to conduct a PSA dosage analysis on this population. Several commercial 

20 assays can be used (WO 96/21 042, herein by reference). In one preferred embodiment, a Hybritech 
assay is used and control individuals must have a level of PSA less than 2.8 ng/ml of serum in order 
to be selected as such. In a preferred embodiment, the Yang assay is used and trait negative 
individuals must have a level of PSA of less than 4 ng/ml of serum in order to be included in the 
population under study. More preferably, the control population is at least 65 year old. 

25 Association Analysis 

The association analysis showed an association between BAP28-related biallelic markers 
and prostate cancer, more particularly both familial prostate cancer and sporadic prostate cancer. The 
results of the association study werefurther details in example 5. 

A single point analysis of the association study showed an association between biallelic 

30 markers of the BAP28 gene and prostate cancer, preferably sporadic prostate cancer is associated 
most strongly with the biallelic markers A28 (5-14/165), A4 (5-382/316), Al (5-381/133), and A55 
(99-7182/49) which present a particular interest (Figures 5 and 6). These association results 
constitute new elements for studying the genetic susceptibility of individuals to prostate cancer, 
preferably to sporadic and familial prostate cancer. Further details concerning this association study 

35 are provided in Figures 5 and 6 and in the example 5. 
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Similar association studies can also be carried out with other biallelic markers within the 
scope of the invention, preferably with biallelic markers in linkage disequilibrium with the markers 
associated with prostate cancer as described above, including the biallelic markers Al to A58. 

Haplotype Analysis 

5 In the context of the present invention, a haplotype can be defined as a combination of 

biallelic markers found in a given individual and which may be associated more or less significantly, 
as a result of appropriate statistical analyses, with the expression of a given trait. 
The haplotype studies are detailed in Example 5. 

Several two-marker haplotypes were significantly associated with familial prostate cancer. 

10 One preferred two-marker haplotype including markers A30 (99-1572/440) and A32 (5-171/204), 
alleles TT respectively, was shown to be significantly associated with prostate cancer, preferably 
with familial prostate cancer. As shown in Figures 8, 9 and 12 A, this haplotype presents a p-value 
of 2.5 10" 6 for the early onset familial prostate cancer (see Example 5). 

Several two-marker haplotypes were significantly associated with sporadic prostate cancer. 

15 One preferred two-marker haplotype including markers A16 (5-370/197), and Al (5-381/133), 
alleles GA was shown to be significantly associated with sporadic prostate cancer. As shown in 
Figures 10, 1 1 and 12 B, this haplotype presents a p-value of 9.4 x 1 0~ 8 for the informative sporadic 
prostate cancer (see Example 5). 

Several two-marker haplotypes were significantly associated with sporadic prostate cancer. 

20 One preferred two-marker haplotype including markers A53 (99-1601/402), and A4 (5-382/316), 
alleles TG, was shown to be significantly associated with sporadic prostate cancer. As shown in 
Figures 10, 1 1 and 12 C, this haplotype presents a p-value of 1 x 10" 5 for the informative sporadic 
prostate cancer (see Example 5). 

Several three-biallelic marker haplotypes are described in the Example 5. 

25 The permutation tests clearly validated the statistical significance of the association 

between these haplotypes and the prostate cancer (see Example 5). All these haplotypes can be used 
in diagnostic of prostate cancer, more particularly either familial prostate cancer or sporadic prostate 
cancer. 

This information is extremely valuable. The knowledge of a potential genetic 
30 predisposition to prostate cancer, even if this predisposition is not absolute, might contribute in a 
very significant manner to treatment efficacy of prostate cancer and to the development of new 
therapeutic and diagnostic tools. 

The invention concerns a haplotype comprising at least one biallelic marker selected from 
the group consisting of Al to A58, preferably A54, A58, A57, A56, A55, Al, A4, A5, A7, Al 1, 
35 A12, A16, A19, A21, A25, A27, A28, A29, A35, A33, A34, A32, A31, A30, A50, A51, A42, A53, 
A43, and A48, more preferably Al, A4, A30, A31, A42, A51, and A53. 
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Statistical methods 

In general, any method known in the art to test whether a trait and a genotype show a 
statistically significant correlation may be used. 

1) Methods In Linkage Analysis 

5 Statistical methods and computer programs useful for linkage analysis are well-known to 

those skilled in the art (see Terwilliger J.D. and Ott J., 1994; Ott J., 1991). 

2) Methods To Estimate Haplotype Frequencies In A Population 

As described above, when genotypes are scored, it is often not possible to distinguish 
heterozygotes so that haplotype frequencies cannot be easily inferred. When the gametic phase is 

10 not known, haplotype frequencies can be estimated from the multilocus genotypic data. Any method 
known to person skilled in the art can be used to estimate haplotype frequencies (see Lange K., 
1997; Weir, B.S., 1996) Preferably, maximum-likelihood haplotype frequencies are computed using 
an Expectation- Maximization (EM) algorithm (see Dempster et al., 1977; Excoffier L. and Slatkin 
M., 1995). This procedure is an iterative process aiming at obtaining maximum-likelihood estimates 

1 5 of haplotype frequencies from multi-locus genotype data when the gametic phase is unknown. 
Haplotype estimations are usually performed by applying the EM algorithm using for example the 
EM-HAPLO program (Hawley M. E. et al., 1994) or the Arlequin program (Schneider et al., 1997). 
The EM algorithm is a generalized iterative maximum likelihood approach to estimation and is 
briefly described below. 

20 Please note that in the present section, "Methods To Estimate Haplotype Frequencies In A 

Population, " of this text, phenotypes will refer to multi-locus genotypes with unknown phase. 
Genotypes will refer to known-phase multi-locus genotypes. 

A sample of N unrelated individuals is typed for K markers. The data observed are the 
unknown-phase K-locus phenotypes that can categorized in F different phenotypes. Suppose that we 
25 have H underlying possible haplotypes (in case of K biallelic markers, H=2 K ). 

For phenotype j, suppose that Cj genotypes are possible. We thus have the following 

equation 

c , c j 
Pf = 1. P r (g en °tyP e i) = Z/v(*jfc>*/) Equation 1 

i=\ i=l 

where Pj is the probability of the phenotype j, h k an& h, are the two haplotypes constituent 

30 the genotype /'. Under the Hardy- Weinberg equilibrium, pr(h h hi) becomes: 

pr(h k ,h l ) = pr(h k ) 2 if h k =h l ,pr(h k , h t ) = 2pr(h k ).pr(h t ) if h k ^h^ Equation 2 
The successive steps of the E-M algorithm can be described as follows: 
Starting with initial values of the of haplotypes frequencies, noted , pf ] , p$ , 

these initial values serve to estimate the genotype frequencies (Expectation step) and then estimate 
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another set of haplotype frequencies (Maximization step), noted ,p ( p, P^h > mese two ste P s 

are iterated until changes in the sets of haplotypes frequency are very small. 

A stop criterion can be that the maximum difference between haplotype frequencies 
between two iterations is less than 1 0" 7 . These values can be adjusted according to the desired 
5 precision of estimations. 

At a given iteration s, the Expectation step consists in calculating the genotypes 
frequencies by the following equation: 

pr (genotype = pr(phenotype j). pr (genotype j\phenotype j)^ 

_nj pr(h k ,h t )^ Equation 3 

r j 

where genotype i occurs in phenotype j, and where h k and hi constitute genotype i. Each 
10 probability is derived according to eq. 1, and eq. 2 described above. 



Then the Maximization step simply estimates another set of haplotype frequencies given 
the genotypes frequencies. This approach is also known as the gene-counting method (Smith, 1957). 

/>? +1) =^S iSif.pr (genotype^ Equation 4 

2 7=1 ;=i 

15 Where 8^ is an indicator variable which count the number of time haplotype t in genotype 

i. It takes the values of 0, 1 or 2. 

To ensure that the estimation finally obtained is the maximum-likelihood estimation 
several values of departures are required. The estimations obtained are compared and if they are 
different the estimations leading to the best likelihood are kept. 

20 3) Methods To Calculate Linkage Disequilibrium Between Markers 

A number of methods can be used to calculate linkage disequilibrium between any two 
genetic positions, in practice linkage disequilibrium is measured by applying a statistical association 
test to haplotype data taken from a population. 

Linkage disequilibrium between any pair of biallelic markers comprising at least one of the 
25 biallelic markers of the present invention (Mj, Mj) having alleles (a/bi) at marker Mj and alleles 

(a/bj) at marker M, can be calculated for every allele combination (a„aj ; a„b, 5 b„aj and b„bj), according 
to the Piazza formula: 

Aa iaj = V94 - V (64 + 03) (94 +92), where: 

94= - - = frequency of genotypes not having allele a ; at M, and not having allele aj at Mj 
30 93= - + = frequency of genotypes not having allele aj at M, and having allele aj at Mj 

92= + - = frequency of genotypes having allele a ; at M ; and not having allele aj at Mj 
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Linkage disequilibrium (LD) between pairs of biallelic markers (M„ M,) can also be 
calculated for every allele combination (ai,aj,ai,bj ; bi,aj andb i? bj), according to the maximum- 
likelihood estimate (MLE) for delta (the composite genotypic disequilibrium coefficient), as 
5 described by Weir (Weir B. S., 1996). The MLE for the composite linkage disequilibrium is: 

D aiaj = (2nj + n 2 + n 3 + n 4 /2)/N - 2(pr(a;). pr(aj)) 

Where ni = S phenotype (aM, a/aj), n 2 = 2 phenotype (a,/aj, a/bj), n 3 = I phenotype (a/bi, 
a/aj), n4= E phenotype (aj/b„ a/bj) and N is the number of individuals in the sample. 

This formula allows linkage disequilibrium between alleles to be estimated when only 
10 genotype, and not haplotype, data are available. 

Another means of calculating the linkage disequilibrium between markers is as follows. 
For a couple of biallelic markers, M, (a/b,) and M } (a/bj), fitting the Hardy- Weinberg equilibrium, 
one can estimate the four possible haplotype frequencies in a given population according to the 
15 approach described above. 

The estimation of gametic disequilibrium between ai and aj is simply: 
D aiaj = pr(haplotype(a; ,«/))- pria-^.pria } ). 

Where pr(a) is the probability of allele a, and pr(^ is the probability of allele a, and where 
pr(haplotype (a„ a)) is estimated as in Equation 3 above. 
20 For a couple of biallelic marker only one measure of disequilibrium is necessary to 

describe the association between M,and M r 

Then a normalized value of the above is calculated as follows: 

D'aiaj = D a i a j / max (-pr(a ; ). pr(aj) , -pr(b ; ). pr(bj)) with D aiaj <0 
D'aiaj = D aiaj / max (pr(bj). pr(aj) , pr(a,). pr(b,)) with D aiaj >0 
25 The skilled person will readily appreciate that other linkage disequilibrium calculation 

methods can be used. 

Linkage disequilibrium among a set of biallelic markers having an adequate heterozygosity 
rate can be determined by genotyping between 50 and 1000 unrelated individuals, preferably 
between 75 and 200, more preferably around 100. 

30 4) Testing For Association 

Methods for determining the statistical significance of a correlation between a phenotype 
and a genotype, in this case an allele at a biallelic marker or a haplotype made up of such alleles, 
may be determined by any statistical test known in the art and with any accepted threshold of 
statistical significance being required. The application of particular methods and thresholds of 
35 significance are well with in the skill of the ordinary practitioner of the art. 
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Testing for association is performed by determining the frequency of a biallelic marker 
allele in case and control populations and comparing these frequencies with a statistical test to 
determine if their is a statistically significant difference in frequency which would indicate a 
correlation between the trait and the biallelic marker allele under study. Similarly, a haplotype 
5 analysis is performed by estimating the frequencies of all possible haplotypes for a given set of 
biallelic markers in case and control populations, and comparing these frequencies with a statistical 
test to determine if their is a statistically significant correlation between the haplotype and the 
phenotype (trait) under study. Any statistical tool useful to test for a statistically significant 
association between a genotype and a phenotype may be used. Preferably the statistical test 
10 employed is a chi-square test with one degree of freedom. A P- value is calculated (the P-value is the 
probability that a statistic as large or larger than the observed one would occur by chance). 
Statistical Significance 

In preferred embodiments, significance for diagnosis purposes, either as a positive basis 
for further diagnostic tests or as a preliminary starting point for early preventive therapy, the p value 

15 related to a biallelic marker association is preferably about 1 x 10" 2 or less, more preferably about 1 x 
10" 4 or less, for a single biallelic marker analysis and about 1 x 10" 3 or less, still more preferably 1 x 
10" 6 or less and most preferably of about 1 x 10" 8 or less, for a haplotype analysis involving two or 
more markers. These values are believed to be applicable to any association studies involving single 
or multiple marker combinations. 

20 The skilled person can use the range of values set forth above as a starting point in order to 

carry out association studies with biallelic markers of the present invention. In doing so, significant 
associations between the biallelic markers of the present invention and a trait can be revealed and 
used for diagnosis and drug screening purposes. 
Phenotypic Permutation 

25 In order to confirm the statistical significance of the first stage haplotype analysis 

described above, it might be suitable to perform further analyses in which genotyping data from 
case-control individuals are pooled and randomized with respect to the trait phenotype. Each 
individual genotyping data is randomly allocated to two groups, which contain the same number of 
individuals as the case-control populations used to compile the data obtained in the first stage. A 

30 second stage haplotype analysis is preferably run on these artificial groups, preferably for the 
markers included in the haplotype of the first stage analysis showing the highest relative risk 
coefficient. This experiment is reiterated preferably at least between 100 and 10000 times. The 
repeated iterations allow the determination of the probability to obtain the tested haplotype by 
chance. 

35 Assessment Of Statistical Association 

To address the problem of false positives similar analysis may be performed with the same 
case-control populations in random genomic regions. Results in random regions and the candidate 
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region are compared as described in a co-pending US Provisional Patent Application entitled 
"Methods, Software And Apparati For Identifying Genomic Regions Harboring A Gene Associated 
With A Detectable Trait," U.S. Serial Number 60/107,986, filed November 10, 1998, the contents 
of which are incorporated herein by reference. 

5 5) Evaluation Of Risk Factors 

The association between a risk factor (in genetic epidemiology the risk factor is the 
presence or the absence of a certain allele or haplotype at marker loci) and a disease is measured by 
the odds ratio (OR) and by the relative risk (RR). If P(R + ) is the probability of developing the 
disease for individuals with R and P(R") is the probability for individuals without the risk factor, then 
1 0 the relative risk is simply the ratio of the two probabilities, that is: 
RR- P(R + )/P(R ) 

In case-control studies, direct measures of the relative risk cannot be obtained because of 
the sampling design. However, the odds ratio allows a good approximation of the relative risk for 
low-incidence diseases and can be calculated: 




15 OR=(F + /(l-F + ))/(F7(l-F)) 

F + is the frequency of the exposure to the risk factor in cases and F" is the frequency of the 
exposure to the risk factor in controls. F + and F" are calculated using the allelic or haplotype 
frequencies of the study and further depend on the underlying genetic model (dominant, recessive, 
additive...). 

20 One can further estimate the attributable risk (AR) which describes the proportion of 

individuals in a population exhibiting a trait due to a given risk factor. This measure is important in 
quantifying the role of a specific factor in disease etiology and in terms of the public health impact 
of a risk factor. The public health relevance of this measure lies in estimating the proportion of 
cases of disease in the population that could be prevented if the exposure of interest were absent. 
25 AR is determined as follows: 
AR = P E (RR-1)/ (P E (RR-1)+1) 

AR is the risk attributable to a biallelic marker allele or a biallelic marker haplotype. P E is 
the frequency of exposure to an allele or a haplotype within the population at large; and RR is the 
relative risk which, is approximated with the odds ratio when the trait under study has a relatively 
30 low incidence in the general population. 

Identification Of Biallelic Markers In Linkage Disequilibrium With The Biallelic 
Markers of the Invention 
Once a first biallelic marker has been identified in a genomic region of interest, the 
practitioner of ordinary skill in the art, using the teachings of the present invention, can easily 
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identify additional biallelic markers in linkage disequilibrium with this first marker. As mentioned 
before any marker in linkage disequilibrium with a first marker associated with a trait will be 
associated with the trait. Therefore, once an association has been demonstrated between a given 
biallelic marker and a trait, the discovery of additional biallelic markers associated with this trait is 
5 of great interest in order to increase the density of biallelic markers in this particular region. The 
causal gene or mutation will be found in the vicinity of the marker or set of markers showing the 
highest correlation with the trait. 

Identification of additional markers in linkage disequilibrium with a given marker 
involves: (a) amplifying a genomic fragment comprising a first biallelic marker from a plurality of 
10 individuals; (b) identifying of second biallelic markers in the genomic region harboring said first 
biallelic marker; (c) conducting a linkage disequilibrium analysis between said first biallelic marker 
and second biallelic markers; and (d) selecting said second biallelic markers as being in linkage 
disequilibrium with said first marker. Subcombinations comprising steps (b) and (c) are also 
contemplated. 

15 Methods to identify biallelic markers and to conduct linkage disequilibrium analysis are 

described herein and can be carried out by the skilled person without undue experimentation. The 
present invention then also concerns biallelic markers which are in linkage disequilibrium with the 
specific biallelic markers Al to A58, preferably one of the biallelic markers Al to A27, A34, A37 to 
A41, A43 to A49, A52, and A54 to A58, more preferably one of, the biallelic markers Al, A4, 16, 

20 A30, A3 1, A42, A50, A51, and A53, and which are expected to present similar characteristics in 
terms of their respective association with a given trait. In a preferred embodiment, the invention 
concerns biallelic markers which are in linkage disequilibrium with the specific biallelic markers 
Identification Of Functional Mutations 
Mutations in the BAP28 gene which are responsible for a detectable phenotype or trait may 

25 be identified by comparing the sequences of the BAP28 gene from trait positive and control 
individuals. Once a positive association is confirmed with a biallelic marker of the present 
invention, the identified locus can be scanned for mutations. In a preferred embodiment, functional 
regions such as exons and splice sites, promoters and other regulatory regions of the BAP28 gene are 
scanned for mutations. In a preferred embodiment the sequence of the BAP28 gene is compared in 

30 trait positive and control individuals. Preferably, trait positive individuals carry the haplotype 
shown to be associated with the trait and trait negative individuals do not carry the haplotype or 
allele associated with the trait. The detectable trait or phenotype may comprise a variety of 
manifestations of altered BAP 28 function. 

The mutation detection procedure is essentially similar to that used for biallelic marker 

35 identification. The method used to detect such mutations generally comprises the following steps: 
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- amplification of a region of the BAP 28 gene comprising a biallelic marker or a group of 
biallelic markers associated with the trait from DNA samples of trait positive patients and trait- 
negative controls; 

- sequencing of the amplified region; 

5 - comparison of DNA sequences from trait positive and control individuals; 

- determination of mutations specific to trait-positive patients. 

In one embodiment, said biallelic marker is selected from the group consisting of A 1 to 
A58, and the complements thereof. In a preferred embodiment, said biallelic marker is selected from 
the group consisting of Al to A27, A34, A37 to A41, A43 to A49, A52, and A54 to A58. In a more 
10 preferred embodiment, said biallelic marker is selected from the group consisting of Al, A4, 16, 
A30, A3 1, A42, A50, A51, and A53. It is preferred that candidate polymorphisms be then verified 
by screening a larger population of cases and controls by means of any genotyping procedure such 
as those described herein, preferably using a microsequencing technique in an individual test format. 
Polymorphisms are considered as candidate mutations when present in cases and controls at 
1 5 frequencies compatible with the expected association results. Polymorphisms are considered as 
candidate "trait-causing" mutations when they exhibit a statistically significant correlation with the 
detectable phenotype. 

Biallelic Markers Of The Invention In Methods Of Genetic Diagnostics 
The biallelic markers of the present invention can also be used to develop diagnostics tests 
20 capable of identifying individuals who express a detectable trait as the result of a specific genotype 
or individuals whose genotype places them at risk of developing a detectable trait at a subsequent 
time. The trait analyzed using the present diagnostics may be any detectable trait, including 
susceptibility to prostate cancer, the level of aggressiveness of prostate cancer tumors, an early onset 
of prostate cancer, a beneficial response to or side effects related to treatment against prostate 
25 cancer. Such a diganosis can be useful in the staging, monitoring, prognosis and/or prophylactic or 
curative therapy of prostate cancer. 

The diagnostic techniques of the present invention may employ a variety of methodologies 
to determine whether a test subject has a biallelic marker pattern associated with an increased risk of 
developing a detectable trait or whether the individual suffers from a detectable trait as a result of a 
30 particular mutation, including methods which enable the analysis of individual chromosomes for 
haplotyping, such as family studies, single sperm DNA analysis or somatic hybrids. 

The present invention provides diagnostic methods to determine whether an individual is at 
risk of developing a disease or suffers from a disease resulting from a mutation or a polymorphism 
in the BAP28 gene. The present invention also provides methods to determine whether an individual 
35 has a susceptibility to prostate cancer. 

These methods involve obtaining a nucleic acid sample from the individual and, 
determining, whether the nucleic acid sample contains at least one allele or at least one biallelic 
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marker haplotype, indicative of a risk of developing the trait or indicative that the individual 
expresses the trait as a result of possessing a particular BAP28 polymorphism or mutation (trait- 
causing allele). 

Preferably, in such diagnostic methods, a nucleic acid sample is obtained from the 
5 individual and this sample is genotyped using methods described above in "Methods Of Genotyping 
DNA Samples For Biallelic markers. The diagnostics may be based on a single biallelic marker or a 
on group of biallelic markers. 

In each of these methods, a nucleic acid sample is obtained from the test subject and the 
biallelic marker pattern of one or more of the biallelic markers Al to A58, preferably one or more of 
10 the biallelic markers Al to A27, A34, A37 to A41, A43 to A49, A52, and A54 to A58, more 

preferably one or more of the biallelic markers Al, A4, 16, A30, A31, A42, A50, A51, and A53, is 
determined. 

In one embodiment, a PCR amplification is conducted on the nucleic acid sample to 
amplify regions in which polymorphisms associated with a detectable phenotype have been 

1 5 identified. The amplification products are sequenced to determine whether the individual possesses 
one or more BAP28 polymorphisms associated with a detectable phenotype. The primers used to 
generate amplification products may comprise the primers listed in Table 1 . Alternatively, the 
nucleic acid sample is subjected to microsequencing reactions as described above to determine 
whether the individual possesses one or more BAP28 polymorphisms associated with a detectable 

20 phenotype resulting from a mutation or a polymorphism in the BAP28 gene. The primers used in the 
microsequencing reactions may include the primers listed in Table 4. In another embodiment, the 
nucleic acid sample is contacted with one or more allele specific oligonucleotide probes which, 
specifically hybridize to one or more BAP28 alleles associated with a detectable phenotype. The 
probes used in the hybridization assay may include the probes listed in Table 3. In another 

25 embodiment, the nucleic acid sample is contacted with a second BAP28 oligonucleotide capable of 
producing an amplification product when used with the allele specific oligonucleotide in an 
amplification reaction. The presence of an amplification product in the amplification reaction 
indicates that the individual possesses one or more BAP28 alleles associated with a detectable 
phenotype. 

30 In a preferred embodiment the identity of the nucleotide present at, at least one, biallelic 

marker selected from the group consisting of Al to A58 and the complements thereof, preferably Al 
to A27, A34, A37 to A41, A43 to A49, A52, and A54 to A58, more preferably Al, A4, 16, A30, 
A3 1, A42, A50, A51, and A53, and the complements thereof, is determined and the detectable trait 
is prostate cancer, more preferably sporadic prostate cancer. Diagnostic kits comprise any of the 

35 polynucleotides of the present invention. 
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These diagnostic methods are extremely valuable as they can, in certain circumstances, be 
used to initiate preventive treatments or to allow an individual carrying a significant haplotype to 
foresee warning signs such as minor symptoms. 

Diagnostics, which analyze and predict response to a drug or side effects to a drug, may be 

5 used to determine whether an individual should be treated with a particular drug. For example, if the 
diagnostic indicates a likelihood that an individual will respond positively to treatment with a 
particular drug, the drug may be administered to the individual. Conversely, if the diagnostic 
indicates that an individual is likely to respond negatively to treatment with a particular drug, an 
alternative course of treatment may be prescribed. A negative response may be defined as either the 

10 absence of an efficacious response or the presence of toxic side effects. 

Clinical drug trials represent another application for the markers of the present invention. 
One or more markers indicative of response to an agent acting against prostate cancer or to side 
effects to an agent acting against prostate cancer may be identified using the methods described 
above. Thereafter, potential participants in clinical trials of such an agent may be screened to 

1 5 identify those individuals most likely to respond favorably to the drug and exclude those likely to 
experience side effects. In that way, the effectiveness of drug treatment may be measured in 
individuals who respond positively to the drug, without lowering the measurement as a result of the 
inclusion of individuals who are unlikely to respond positively in the study and without risking 
undesirable safety problems. 

20 Treatment Of Prostate Cancer 

As the metastasis of prostate cancer can be fatal, it is important to detect prostate cancer 
susceptibility of individuals. Consequently, the invention also concerns a method for the treatment 
of prostate cancer comprising the following steps: 

- selecting an individual whose DNA comprises alleles of a biallelic marker or of a 
25 group of biallelic markers, preferably BAP 2 8-related markers, associated with prostate cancer; 

- following up said individual for the appearance (and optionally the development) of 
tumors in prostate; and 

- administering an effective amount of a medicament acting against prostate cancer to 
said individual at an appropriate stage of the prostate cancer. 

30 In one embodiment, said biallelic marker is selected from the group consisting of A 1 to 

A58, and the complements thereof. In a preferred embodiment, said biallelic marker is selected from 
the group consisting of Al to A27, A34, A37 to A41, A43 to A49, A52, and A54 to A58 and the 
complements thereof. In a preferred embodiment, said biallelic marker is selected from the group 
consisting of Al, A4, 16, A30, A31, A42, A50, A51, and A53, and the complements thereof. 

3 5 The prophylactic administration of a treatment serves to prevent, attenuate or inhibit the 

growth of cancer cells. 
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Another embodiment of the present invention consists of a method for the treatment of 
prostate cancer comprising the following steps: 

- selecting an individual whose DNA comprises alleles of a biallelic marker or of a 
group of biallelic markers, preferably BAP28-related markers, associated with prostate cancer; 

5 - administering to said individual, preferably as a preventive treatment of prostate 

cancer, an effective amount of a medicament acting against prostate cancer such as 4HPR. 
In one embodiment, said biallelic marker is selected from the group consisting of A 1 to 
A58, and the complements thereof. In a preferred embodiment, said biallelic marker is selected from 
the group consisting of Al to A27, A34, A37 to A41, A43 to A49, A52, and A54 to A58 and the 
10 complements thereof. In a preferred embodiment, said biallelic marker is selected from the group 
consisting of Al, A4, 16, A30, A31, A42, A50, A51, and A53, and the complements thereof. 

In a further embodiment, the present invention concerns a method for the treatment of 
prostate cancer comprising the following steps: 

- selecting an individual whose DNA comprises alleles of a biallelic marker or of a 
15 group of biallelic markers, preferably BAP 2 8-r elated markers, associated with a susceptibility 

prostate cancer; 

- administering to said individual, as a preventive treatment of prostate cancer, an 
effective amount of a medicament acting against prostate cancer such as 4HPR; 

- following up said individual for the appearance and the development of tumors in 
20 prostate; and optionally 

- administering an effective amount of a medicament acting against prostate cancer to 
said individual at the appropriate stage of the prostate cancer. 

In one embodiment, said biallelic marker is selected from the group consisting of A 1 to 
A58, and the complements thereof. In a preferred embodiment, said biallelic marker is selected from 
25 the group consisting of Al to A27, A34, A37 to A4 1 , A43 to A49, A52, and A54 to A58 and the 
complements thereof. In a preferred embodiment, said biallelic marker is selected from the group 
consisting of Al, A4, 16, A30, A31, A42, A50, A51, and A53, and the complements thereof. 

To enlighten the choice of the appropriate beginning of the treatment of prostate cancer, 
the present invention also concerns a method for the treatment of prostate cancer comprising the 
30 following steps: 

- selecting an individual suffering from a prostate cancer whose DNA comprises 
alleles of a biallelic marker or of a group of biallelic markers, preferably BAP28-re\ated 
markers, associated with the aggressiveness of prostate cancer tumors; and 

- administering an effective amount of a medicament acting against prostate cancer to 
35 said individual. 

In one embodiment, said biallelic marker is selected from the group consisting of Al to 
A58, and the complements thereof. In a preferred embodiment, said biallelic marker is selected from 
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the group consisting of Al to A27, A34, A37 to A41, A43 to A49, A52, and A54 to A58 and the 
complements thereof. In a preferred embodiment, said biallelic marker is selected from the group 
consisting of Al , A4, 1 6, A30, A3 1 , A42, A50, A5 1, and A53, and the complements thereof. In 
particular embodiments, the individual is selected by genotyping one or more biallelic markers of 
5 the present invention. 

Recombinant Vectors 

The term "vector" is used herein to designate either a circular or a linear DNA or RNA 
molecule, which is either double- stranded or single-stranded, and which comprise at least one 
polynucleotide of interest that is sought to be transferred in a cell host or in a unicellular or 
1 0 multicellular host organism. 

The present invention encompasses a family of recombinant vectors that comprise a 
regulatory polynucleotide derived from the BAP28 genomic sequence, and/or a coding 
polynucleotide from either the BAP28 genomic sequence or the cDNA sequence. 

Generally, a recombinant vector of the invention may comprise any of the polynucleotides 
1 5 described herein, including regulatory sequences, coding sequences and polynucleotide constructs, 
as well as any BAP28 primer or probe as defined above. More particularly, the recombinant vectors 
of the present invention can comprise any of the polynucleotides described in the "Genomic 
Sequences Of The BAP28 Gene" section, the " BAP28 cDNA Sequences" section, the "Coding 
Regions" section, the "Polynucleotide constructs" section, and the "Oligonucleotide Probes And 
20 Primers" section. 

In a first preferred embodiment, a recombinant vector of the invention is used to amplify 
the inserted polynucleotide derived from a BAP28 genomic sequence of SEQ ID No 1 or a BAP28 
cDNA, for example the cDNA of SEQ ID No 2, 3 or 4 in a suitable cell host, this polynucleotide 
being amplified at every time that the recombinant vector replicates. 

25 A second preferred embodiment of the recombinant vectors according to the invention 

consists of expression vectors comprising either a regulatory polynucleotide or a coding nucleic acid 
of the invention, or both. Within certain embodiments, expression vectors are employed to express 
the BAP28 polypeptide which can be then purified and, for example be used in ligand screening 
assays or as an immunogen in order to raise specific antibodies directed against the BAP28 protein. 

30 In other embodiments, the expression vectors are used for constructing transgenic animals and also 
for gene therapy. Expression requires that appropriate signals are provided in the vectors, said 
signals including various regulatory elements, such as enhancers/promoters from both viral and 
mammalian sources that drive expression of the genes of interest in host cells. Dominant drug 
selection markers for establishing permanent, stable cell clones expressing the products are generally 

35 included in the expression vectors of the invention, as they are elements that link expression of the 
drug selection markers to expression of the polypeptide. 
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In a further embodiment, the invention concerns a vector comprising a polynucleotide 
sequence seleted from the group consisting of SEQ ID Nos 4, and 9-13, a complementary sequence 
thereto or a fragment thereof. 

More particularly, the present invention relates to expression vectors which include nucleic 
5 acids encoding a BAP28 protein, preferably the BAP28 protein of the amino acid sequence of SEQ 
ID No 5 or variants or fragments thereof. 

The invention also pertains to a recombinant expression vector useful for the expression of 
the BAP28 coding sequence, wherein said vector comprises a nucleic acid of SEQ ID No 2 or 3. 
Recombinant vectors comprising a nucleic acid containing a ,R4P2S-related biallelic 
10 marker is also part of the invention. In a preferred embodiment, said biallelic marker is selected 
from the group consisting of Al to A58, preferably Al to A27, A34, A37 to A41, A43 to A49, A52, 
and A54 to A58, more preferably Al, A4, 16, A30, A31, A42, A50, A51, and A53, and the 
complements thereof. 

Some of the elements which can be found in the vectors of the present invention are 
5 described in further detail in the following sections. 

The present invention also encompasses primary, secondary, and immortalized 
homologously recombinant host cells of vertebrate origin, preferably mammalian origin and 
particularly human origin, that have been engineered to: a) insert exogenous (heterologous) 
polynucleotides into the endogenous chromosomal DNA of a targeted gene, b) delete endogenous 
>0 chromosomal DNA, and/or c) replace endogenous chromosomal DNA with exogenous 

polynucleotides. Insertions, deletions, and/or replacements of polynucleotide sequences may be to 
the coding sequences of the targeted gene and/or to regulatory regions, such as promoter and 
enhancer sequences, operably associated with the targeted gene. 

The present invention further relates to a method of making a homologously recombinant 
25 host cell in vitro or in vivo, wherein the expression of a targeted gene not normally expressed in the 
cell is altered. Preferably the alteration causes expression of the targeted gene under normal growth 
conditions or under conditions suitable for producing the polypeptide encoded by the targeted gene. 
The method comprises the steps of: (a) transfecting the cell in vitro or in vivo with a polynucleotide 
construct, the a polynucleotide construct comprising; (i) a targeting sequence; (ii) a regulatory 
30 sequence and/or a coding sequence; and (iii) an unpaired splice donor site, if necessary, thereby 
producing a transfected cell; and (b) maintaining the transfected cell in vitro or in vivo under 
conditions appropriate for homologous recombination. 

The present invention further relates to a method of altering the expression of a targeted 
gene in a cell in vitro or in vivo wherein the gene is not normally expressed in the cell, comprising 
35 the steps of: (a) transfecting the cell in vitro or in vivo with a polynucleotide construct, the a 
polynucleotide construct comprising: (i) a targeting sequence; (ii) a regulatory sequence and/or a 
coding sequence; and (iii) an unpaired splice donor site, if necessary, thereby producing a 
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transfected cell; and (b) maintaining the transfected cell in vitro or in vivo under conditions 
appropriate for homologous recombination, thereby producing a homologously recombinant cell; 
and (c) maintaining the homologously recombinant cell in vitro or in vivo under conditions 
appropriate for expression of the gene. 
5 The present invention further relates to a method of making a polypeptide of the present 

invention by altering the expression of a targeted endogenous gene in a cell in vitro or in vivo 
wherein the gene is not normally expressed in the cell, comprising the steps of: a) transfecting the 
cell in vitro with a a polynucleotide construct, the a polynucleotide construct comprising: (i) a 
targeting sequence; (ii) a regulatory sequence and/or a coding sequence; and (iii) an unpaired splice 

1 0 donor site, if necessary, thereby producing a transfected cell; (b) maintaining the transfected cell in 
vitro or in vivo under conditions appropriate for homologous recombination, thereby producing a 
homologously recombinant cell; and c) maintaining the homologously recombinant cell in vitro or in 
vivo under conditions appropriate for expression of the gene thereby making the polypeptide. 
The present invention further relates to a polynucleotide construct which alters the 

1 5 expression of a targeted gene in a cell type in which the gene is not normally expressed. This occurs 
when the a polynucleotide construct is inserted into the chromosomal DNA of the target cell, 
wherein the a polynucleotide construct comprises: a) a targeting sequence; b) a regulatory sequence 
and/or coding sequence; and c) an unpaired splice-donor site, if necessary. Further included are a 
polynucleotide constructs, as described above, wherein the construct further comprises a 

20 polynucleotide which encodes a polypeptide and is in-frame with the targeted endogenous gene after 
homologous recombination with chromosomal DNA. 

The compositions may be produced, and methods performed, by techniques known in the 
art, such as those described in U.S. Patent Nos: 6,054,288; 6,048,729; 6,048,724; 6,048,524; 
5,994,127; 5,968,502; 5,965,125; 5,869,239; 5,817,789; 5,783,385; 5,733,761; 5,641,670; 5,580,734 

25 ; International Publication Nos:W096/2941 1, WO 94/12650; and scientific articles including 1 994; 
Roller et ah, Proc. Natl. Acad. Sci. USA 86:8932-8935 (1989) (the disclosures of each of which are 
incorporated by reference in their entireties). 

1. General features of the expression vectors of the invention 

A recombinant vector according to the invention comprises, but is not limited to, a YAC 
30 (Yeast Artificial Chromosome), a BAC (Bacterial Artificial Chromosome), a phage, a phagemid, a 
cosmid, a plasmid or even a linear DNA molecule which may comprise a chromosomal, non- 
chromosomal, semi-synthetic and synthetic DNA. Such a recombinant vector can comprise a 
transcriptional unit comprising an assembly of: 

(1) a genetic element or elements having a regulatory role in gene expression, for example 
35 promoters or enhancers. Enhancers are cis-acting elements of DNA, usually from about 10 to 300 
bp in length that act on the promoter to increase the transcription. 
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(2) a structural or coding sequence which is transcribed into mRNA and eventually 
translated into a polypeptide, said structural or coding sequence being operably linked to the 
regulatory elements described in (1); and 

(3) appropriate transcription initiation and termination sequences. Structural units intended 
5 for use in yeast or eukaryotic expression systems preferably include a leader sequence enabling 

extracellular secretion of translated protein by a host cell. Alternatively, when a recombinant protein 
is expressed without a leader or transport sequence, it may include a N-terminal residue. This 
residue may or may not be subsequently cleaved from the expressed recombinant protein to provide 
a final product. 

10 Generally, recombinant expression vectors will include origins of replication, selectable 

markers permitting transformation of the host cell, and a promoter derived from a highly expressed 
gene to direct transcription of a downstream structural sequence. The heterologous structural 
sequence is assembled in appropriate phase with translation initiation and termination sequences, 
and preferably a leader sequence capable of directing secretion of the translated protein into the 

1 5 periplasmic space or the extracellular medium. In a specific embodiment wherein the vector is 

adapted for transfecting and expressing desired sequences in mammalian host cells, preferred vectors 
will comprise an origin of replication in the desired host, a suitable promoter and enhancer, and also 
any necessary ribosome binding sites, polyadenylation site, splice donor and acceptor sites, 
transcriptional termination sequences, and 5 '-flanking non-transcribed sequences. DNA sequences 

20 derived from the SV40 viral genome, for example SV40 origin, early promoter, enhancer, splice and 
polyadenylation sites may be used to provide the required non-transcribed genetic elements. 

The in vivo expression of a BAP28 polypeptide of SEQ ID No 5 or fragments or variants 
thereof may be useful in order to correct a genetic defect related to the expression of the native gene 
in a host organism or to the production of a biologically inactive BAP28 protein. 

25 Consequently, the present invention also deals with recombinant expression vectors mainly 

designed for the in vivo production of the BAP28 polypeptide of SEQ ID No 5 or fragments or 
variants thereof by the introduction of the appropriate genetic material in the organism of the patient 
to be treated. This genetic material may be introduced in vitro in a cell that has been previously 
extracted from the organism, the modified cell being subsequently reintroduced in the said organism, 

30 directly in vivo into the appropriate tissue. 

2. Regulatory Elements 

Promoters 

The suitable promoter regions used in the expression vectors according to the present 
invention are chosen taking into account the cell host in which the heterologous gene has to be 
35 expressed. The particular promoter employed to control the expression of a nucleic acid sequence of 
interest is not believed to be important, so long as it is capable of directing the expression of the 
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nucleic acid in the targeted cell. Thus, where a human cell is targeted, it is preferable to position the 
nucleic acid coding region adjacent to and under the control of a promoter that is capable of being 
expressed in a human cell, such as, for example, a human or a viral promoter. 

A suitable promoter may be heterologous with respect to the nucleic acid for which it 
5 controls the expression or alternatively can be endogenous to the native polynucleotide containing 
the coding sequence to be expressed. Additionally, the promoter is generally heterologous with 
respect to the recombinant vector sequences within which the construct promoter/coding sequence 
has been inserted. 

Promoter regions can be selected from any desired gene using, for example, CAT 
1 0 (chloramphenicol transferase) vectors and more preferably pKK232-8 and pCM7 vectors. 

Preferred bacterial promoters are the LacI, LacZ, the T3 or T7 bacteriophage RNA 
polymerase promoters, the gpt, lambda PR, PL and trp promoters (EP 0036776), the polyhedrin 
promoter, or the plO protein promoter from baculovirus (Kit Novagen) (Smith et al., 1983; O'Reilly 
et al., 1992), the lambda PR promoter or also the trc promoter. 
1 5 Eukaryotic promoters include CMV immediate early, HSV thymidine kinase, early and 

late SV40, LTRs from retrovirus, and mouse metallothionein-L. Selection of a convenient vector 
and promoter is well within the level of ordinary skill in the art. 

The choice of a promoter is well within the ability of a person skilled in the field of genetic 
egineering. For example, one may refer to the book of Sambrook et al.(1989) or also to the 
20 procedures described by Fuller et al.( 1 996). 
Other regulatory elements 

Where a cDNA insert is employed, one will typically desire to include a polyadenylation 
signal to effect proper polyadenylation of the gene transcript. The nature of the polyadenylation 
signal is not believed to be crucial to the successful practice of the invention, and any such sequence 
25 may be employed such as human growth hormone and SV40 polyadenylation signals. Also 

contemplated as an element of the expression cassette is a terminator. These elements can serve to 
enhance message levels and to minimize read through from the cassette into other sequences. 

3. Selectable Markers 

Such markers would confer an identifiable change to the cell permitting easy identification 
30 of cells containing the expression construct. The selectable marker genes for selection of 

transformed host cells are preferably dihydrofolate reductase or neomycin resistance for eukaryotic 
cell culture, TRP1 for S. cerevisiae or tetracycline, rifampicin or ampicillin resistance in E. coli, or 
levan saccharase for mycobacteria, this latter marker being a negative selection marker. 
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4. Preferred Vectors. 

Bacterial vectors 

As a representative but non- limiting example, useful expression vectors for bacterial use 
can comprise a selectable marker and a bacterial origin of replication derived from commercially 
5 available plasmids comprising genetic elements of pBR322 (ATCC 37017). Such commercial 
vectors include, for example, pKK223-3 (Pharmacia, Uppsala, Sweden), and GEM1 (Promega 
Biotec, Madison, WI, USA). 

Large numbers of other suitable vectors are known to those of skill in the art, and 
commercially available, such as the following bacterial vectors: pQE70, pQE60, pQE-9 (Qiagen), 
10 pbs, pDIO, phagescript, psiX174, pbluescript SK, pbsks, pNH8A, pNH16A, pNH18A, pNH46A 
(Stratagene); ptrc99a, pKK223-3, pKK233-3, pDR540, pRIT5 (Pharmacia); pWLNEO, pSV2CAT, 
pOG44, pXTl, pSG (Stratagene); pSVK3, pBPV, pMSG, pSVL (Pharmacia); pQE-30 
(QIAexpress). 

Bacteriophage vectors 

15 The PI bacteriophage vector may contain large inserts ranging from about 80 to about 100 

kb. 

The construction of PI bacteriophage vectors such as pl58 or pl58/neo8 are notably 
described by Sternberg (1992, 1994). Recombinant PI clones comprising BAP28 nucleotide 
sequences may be designed for inserting large polynucleotides of more than 40 kb (Linton et al., 

20 1993). To generate PI DNA for transgenic experiments, a preferred protocol is the protocol 

described by McCormick et al.( 1 994). Briefly, E. coli (preferably strain NS3529) harboring the P 1 
plasmid are grown overnight in a suitable broth medium containing 25 ng/ml of kanamycin. The PI 
DNA is prepared from the E. coli by alkaline lysis using the Qiagen Plasmid Maxi kit (Qiagen, 
Chatsworth, CA, USA), according to the manufacturer's instructions. The PI DNA is purified from 

25 the bacterial lysate on two Qiagen-tip 500 columns, using the washing and elution buffers contained 
in the kit. A phenol/chloroform extraction is then performed before precipitating the DNA with 70% 
ethanol. After solubilizing the DNA in TE (10 mM Tris-HCl, pH 7.4, 1 mM EDTA), the 
concentration of the DNA is assessed by spectrophotometry. 

When the goal is to express a PI clone comprising BAP28 nucleotide sequences in a 

30 transgenic animal, typically in transgenic mice, it is desirable to remove vector sequences from the 
PI DNA fragment, for example by cleaving the PI DNA at rare-cutting sites within the PI 
polylinker (Sfil, Notl or Sail). The PI insert is then purified from vector sequences on a pulsed- field 
agarose gel, using methods similar using methods similar to those originally reported for the 
isolation of DNA from YACs (Schedl et al., 1993a; Peterson et al., 1993). At this stage, the 

35 resulting purified insert DNA can be concentrated, if necessary, on a Millipore Ultrafree-MC Filter 
Unit (Millipore, Bedford, MA, USA - 30,000 molecular weight limit) and then dialyzed against 
microinjection buffer (10 mM Tris-HCl, pH 7.4; 250 uM EDTA) containing 100 mM NaCl, 30 uM 
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spermine, 70 uM spermidine on a microdyalisis membrane (type VS, 0.025 uM from Millipore). 
The intactness of the purified PI DNA insert is assessed by electrophoresis on 1% agarose (Sea Kem 
GTG; FMC Bio-products) pulse-field gel and staining with ethidium bromide. 
Baculovirus vectors 

5 A suitable vector for the expression of the BAP28 polypeptide of SEQ ID No 5 or 

fragments or variants thereof is a baculovirus vector that can be propagated in insect cells and in 
insect cell lines. A specific suitable host vector system is the pVL1392/1393 baculovirus transfer 
vector (Pharrningen) that is used to transfect the SF9 cell line (ATCC N°CRL 171 1) which is 
derived from Spodoptera jrugiperda. 
10 Other suitable vectors for the expression of the BAP28 polypeptide of SEQ ID No 5 or 

fragments or variants thereof in a baculovirus expression system include those described by Chai et 
al.(1993), Vlasak et al.(1983) and Lenhard et al.(1996). 
Viral vectors 

In one specific embodiment, the vector is derived from an adenovirus. Preferred 

15 adenovirus vectors according to the invention are those described by Feldman and Steg (1996) or 
Ohno et al.(1994). Another preferred recombinant adenovirus according to this specific embodiment 
of the present invention is the human adenovirus type 2 or 5 (Ad 2 or Ad 5) or an adenovirus of 
animal origin ( French patent application N° FR- 93 .05954). 

Retrovirus vectors and adeno-associated virus vectors are generally understood to be the 

20 recombinant gene delivery systems of choice for the transfer of exogenous polynucleotides in vivo , 
particularly to mammals, including humans. These vectors provide efficient delivery of genes into 
cells, and the transferred nucleic acids are stably integrated into the chromosomal DNA of the host. 

Particularly preferred retroviruses for the preparation or construction of retroviral in vitro 
or in vitro gene delivery vehicles of the present invention include retroviruses selected from the 

25 group consisting of Mink-Cell Focus Inducing Virus, Murine Sarcoma Virus, Reticuloendotheliosis 
virus and Rous Sarcoma virus. Particularly preferred Murine Leukemia Viruses include the 4070A 
and the 1 5 04 A viruses, Abelson (ATCC No VR-999), Friend (ATCC No VR-245), Gross (ATCC 
No VR-590), Rauscher (ATCC No VR-998) and Moloney Murine Leukemia Virus (ATCC No VR- 
190; PCT Application No WO 94/24298). Particularly preferred Rous Sarcoma Viruses include 

30 Bryan high titer (ATCC Nos VR-334, VR-657, VR-726, VR-659 and VR-728). Other preferred 
retroviral vectors are those described in Roth et al.(1996), PCT Application No WO 93/25234, PCT 
Application No WO 94/ 06920, Roux et al., 1989, Man et al., 1992 and Neda et al., 1991. 

Yet another viral vector system that is contemplated by the invention consists in the adeno- 
associated virus (AAV). The adeno-associated virus is a naturally occurring defective virus that 

35 requires another virus, such as an adenovirus or a herpes virus, as a helper virus for efficient 

replication and a productive life cycle (Muzyczka et al., 1 992). It is also one of the few viruses that 
may integrate its DNA into non-dividing cells, and exhibits a high frequency of stable integration 
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(Flotte et al., 1992; Samulski et al., 1989; McLaughlin et al., 1989). One advantageous feature of 
AAV derives from its reduced efficacy for transducing primary cells relative to transformed cells. 
BAC vectors 

The bacterial artificial chromosome (BAC) cloning system (Shizuya et al., 1992) has been 
5 developed to stably maintain large fragments of genomic DNA (100-300 kb) in E. coli. A preferred 
BAC vector consists of pBeloBACl 1 vector that has been described by Kim et al.(1996). BAC 
libraries are prepared with this vector using size-selected genomic DNA that has been partially 
digested using enzymes that permit ligation into either the Bam HI or HindlU sites in the vector. 
Flanking these cloning sites are T7 and SP6 RNA polymerase transcription initiation sites that can 

1 0 be used to generate end probes by either RNA transcription or PCR methods. After the construction 
of a BAC library in E. coli, BAC DNA is purified from the host cell as a supercoiled circle. 
Converting these circular molecules into a linear form precedes both size determination and 
introduction of the BACs into recipient cells. The cloning site is flanked by two Not I sites, 
permitting cloned segments to be excised from the vector by Not I digestion. Alternatively, the 

1 5 DNA insert contained in the pBeloBAC 1 1 vector may be linearized by treatment of the BAC vector 
with the commercially available enzyme lambda terminase that leads to the cleavage at the unique 
cosN site, but this cleavage method results in a full length BAC clone containing both the insert 
DNA and the BAC sequences. 

5. Delivery Of The Recombinant Vectors 
20 In order to effect expression of the polynucleotides and polynucleotide constructs of the 

invention, these constructs must be delivered into a cell. This delivery may be accomplished in 
vitro, as in laboratory procedures for transforming cell lines, or in vivo or ex vivo, as in the treatment 
of certain diseases states. 

One mechanism is viral infection where the expression construct is encapsulated in an 
25 infectious viral particle. 

Several non- viral methods for the transfer of polynucleotides into cultured mammalian 
cells are also contemplated by the present invention, and include, without being limited to, calcium 
phosphate precipitation (Graham et al., 1973; Chen et al., 1987), DEAE-dextran (Gopal, 1985), 
electroporation (Tur-Kaspa et al., 1986; Potter et al., 1984), direct microinjection (Harland et al., 
30 1985), DNA-loaded liposomes (Nicolau et al., 1982; Fraley et al., 1979), and receptor-mediated 
transfection (Wu and Wu, 1987; 1988). Some of these techniques may be successfully adapted for 
in vivo or ex vivo use. 

Once the expression polynucleotide has been delivered into the cell, it may be stably 
integrated into the genome of the recipient cell. This integration may be in the cognate location and 
35 orientation via homologous recombination (gene replacement) or it may be integrated in a random, 
non specific location (gene augmentation). In yet further embodiments, the nucleic acid may be 
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stably maintained in the cell as a separate, episomal segment of DNA. Such nucleic acid segments 
or "episomes" encode sequences sufficient to permit maintenance and replication independent of or 
in synchronization with the host cell cycle. 

One specific embodiment for a method for delivering a protein or peptide to the interior of 
5 a cell of a vertebrate in vivo comprises the step of introducing a preparation comprising a 

physiologically acceptable carrier and a naked polynucleotide operatively coding for the polypeptide 
of interest into the interstitial space of a tissue comprising the cell, whereby the naked 
polynucleotide is taken up into the interior of the cell and has a physiological effect. This is 
particularly applicable for transfer in vitro but it may be applied to in vivo as well. 

1 0 Compositions for use in vitro and in vivo comprising a "naked" polynucleotide are 

described in PCT application N° WO 90/1 1092 (Vical Inc.) and also in PCT application No WO 
95/1 1307 (Institut Pasteur, INSERM, Universite d'Ottawa) as well as in the articles of Tacson et 
al.(1996) and of Huygen et al.(1996). 

In still another embodiment of the invention, the transfer of a naked polynucleotide of the 

1 5 invention, including a polynucleotide construct of the invention, into cells may be proceeded with a 
particle bombardment (biolistic), said particles being DNA-coated microprojectiles accelerated to a 
high velocity allowing them to pierce cell membranes and enter cells without killing them, such as 
described by Klein et al.(1987). 

In a further embodiment, the polynucleotide of the invention may be entrapped in a 

20 liposome (Ghosh and Bacchawat, 1991; Wong et al., 1980; Nicolau et al., 1987) 

In a specific embodiment, the invention provides a composition for the in vivo production 
of the BAP28 protein or polypeptide described herein. It comprises a naked polynucleotide 
operatively coding for this polypeptide, in solution in a physiologically acceptable carrier, and 
suitable for introduction into a tissue to cause cells of the tissue to express the said protein or 

25 polypeptide. 

The amount of vector to be injected to the desired host organism varies according to the 
site of injection. As an indicative dose, it will be injected between 0,1 and 100 ug of the vector in an 
animal body, preferably a mammal body, for example a mouse body. 

In another embodiment of the vector according to the invention, it may be introduced in 
30 vitro in a host cell, preferably in a host cell previously harvested from the animal to be treated and 
more preferably a somatic cell such as a muscle cell. In a subsequent step, the cell that has been 
transformed with the vector coding for the desired BAP28 polypeptide or the desired fragment 
thereof is reintroduced into the animal body in order to deliver the recombinant protein within the 
body either locally or systemically. 
35 Cell Hosts 

Another object of the invention consists of a host cell that has been transformed or 
transfected with one of the polynucleotides described herein, and in particular a polynucleotide 
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either comprising a BAP28 regulatory polynucleotide or the coding sequence of the BAP28 
polypeptide of SEQ ID Nos 1, 2, 3 or 4 or a fragment or a variant thereof. Also included are host 
cells that are transformed (prokaryotic cells) or that are transfected (eukaryotic cells) with a 
recombinant vector such as one of those described above. More particularly, the cell hosts of the 
5 present invention can comprise any of the polynucleotides described in the "Genomic Sequences Of 
The BAP28 Gene" section, the "BAP28 cDNA Sequences" section, the "Coding Regions" section, 
the "Polynucleotide constructs" section, and the "Oligonucleotide Probes And Primers" section. 

A further recombinant cell host according to the invention comprises a polynucleotide 
containing a biallelic marker selected from the group consisting of A 1 to A58, preferably Al to A27, 
10 A34, A37 to A41, A43 to A49, A52, and A54 to A58, more preferably Al, A4, 16, A30, A31, A42, 
A50, A51, and A53, and the complements thereof. 

Preferred host cells used as recipients for the expression vectors of the invention are the 
following: 

a) Prokaryotic host cells: Escherichia coli strains (I.E.DH5-a strain), Bacillus subtilis, 
1 5 Salmonella typhimurium, and strains from species like Pseudomonas, Streptomyces and 

Staphylococcus. 

b) Eukaryotic host cells: HeLa cells (ATCC N°CCL2; N°CCL2.1; N°CCL2.2), Cv 1 cells 
(ATCC N°CCL70), COS cells (ATCC N°CRL1650; N°CRL1651), Sf-9 cells (ATCC N°CRL171 1), 
CI 27 cells (ATCC N° CRL-1804), 3T3 (ATCC N° CRL-6361), CHO (ATCC N° CCL-61), human 

20 kidney 293. (ATCC N° 45504; N° CRL-1573) and BHK (ECACC N° 84100501; N° 841 1 1301). 

c) Other mammalian host cells. 

The BAP28 gene expression in mammalian, and typically human, cells may be rendered 
defective, or alternatively it may be proceeded with the insertion of a BAP28 genomic or cDNA 
sequence with the replacement of the BAP28 gene counterpart in the genome of an animal cell by a 
25 BAP28 polynucleotide according to the invention. These genetic alterations may be generated by 
homologous recombination events using specific DNA constructs that have been previously 
described. 

One kind of cell hosts that may be used are mammal zygotes, such as murine zygotes. For 
example, murine zygotes may undergo microinjection with a purified DNA molecule of interest, for 

30 example a purified DNA molecule that has previously been adjusted to a concentration range from 1 
ng/ml -for BAC inserts- 3 ng/ul -for PI bacteriophage inserts- in 10 mM Tris-HCl, pH 7.4, 250 uM 
EDTA containing 100 mM NaCl, 30 uM spermine, and 70 u.M spermidine. When the DNA to be 
microinjected has a large size, polyamines and high salt concentrations can be used in order to avoid 
mechanical breakage of this DNA, as described by Schedl et al (1993b). 

35 Anyone of the polynucleotides of the invention, including the DNA constructs described 

herein, may be introduced in an embryonic stem (ES) cell line, preferably a mouse ES cell line. ES 
cell lines are derived from pluripotent, uncommitted cells of the inner cell mass of pre-implantation 
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blastocysts. Preferred ES cell lines are the following: ES-E14TG2a (ATCC n° CRL-1 821), ES-D3 
(ATCC n° CRL1934 and n° CRL-1 1632), YS001 (ATCC n° CRL-1 1776), 36.5 (ATCC n° CRL- 
11116). To maintain ES cells in an uncommitted state, they are cultured in the presence of growth 
inhibited feeder cells which provide the appropriate signals to preserve this embryonic phenotype 
5 and serve as a matrix for ES cell adherence. Preferred feeder cells comprise primary embryonic 
fibroblasts that are established from tissue of day 13- day 14 embryos of virtually any mouse strain, 
that are maintained in culture, such as described by Abbondanzo et al.(1993) and are inhibited in 
growth by irradiation, such as described by Robertson (1987), or by the presence of an inhibitory 
concentration of LIF, such as described by Pease and Williams (1990). 

10 The constructs in the host cells can be used in a conventional manner to produce the gene 

product encoded by the recombinant sequence. 

Following transformation of a suitable host and growth of the host to an appropriate cell 
density, the selected promoter is induced by appropriate means, such as temperature shift or 
chemical induction, and cells are cultivated for an additional period. 

1 5 Cells are typically harvested by centrifugation, disrupted by physical or chemical means, 

and the resulting crude extract retained for further purification. 

Microbial cells employed in the expression of proteins can be disrupted by any convenient 
method, including freeze-thaw cycling, sonication, mechanical disruption, or use of cell lysing 
agents. Such methods are well known by the skill artisan. 

20 Transgenic Animals 

The terms "transgenic animals" or "host animals" are used herein designate animals that 
have their genome genetically and artificially manipulated so as to include one of the nucleic acids 
according to the invention. Preferred animals are non-human mammals and include those belonging 
to a genus selected from Mus (e.g. mice), Rattus (e.g. rats) and Oryctogalus (e.g. rabbits) which have 

25 their genome artificially and genetically altered by the insertion of a nucleic acid according to the 
invention. In one embodiment, the invention encompasses non-human host mammals and animals 
comprising a recombinant vector of the invention or a BAP28 gene disrupted by homologous 
recombination with a knock out vector. 

The transgenic animals of the invention all include within a plurality of their cells a cloned 

30 recombinant or synthetic DNA sequence, more specifically one of the purified or isolated nucleic 
acids comprising a BAP28 coding sequence, a BAP28 regulatory polynucleotide, a polynucleotide 
construct, or a DNA sequence encoding an antisense polynucleotide such as described in the present 
specification. 

Generally, a transgenic animal according the present invention comprises any one of the 
35 polynucleotides, the recombinant vectors and the cell hosts described in the present invention. More 
particularly, the transgenic animals of the present invention can comprise any of the polynucleotides 
described in the "Genomic Sequences Of The BAP28 Gene" section, the " BAP28 cDNA 
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Sequences" section, the "Coding Regions" section, the "Polynucleotide constructs" section, the 
"Oligonucleotide Probes And Primers" section, the "Recombinant Vectors" section and the "Cell 
Hosts" section. 

A further transgenic animals according to the invention contains in their somatic cells 
5 and/or in their germ line cells a polynucleotide comprising a biallelic marker selected from the group 
consisting of Al to A58, preferably Al to A27, A34, A37 to A41, A43 to A49, A52, and A54 to 
A58, more preferably Al, A4, 16, A30, A31, A42, A50, A51, and A53, and the complements 
thereof. 

In a first preferred embodiment, these transgenic animals may be good experimental 
10 models in order to study the diverse pathologies related to cell differentiation, in particular 
concerning the transgenic animals within the genome of which has been inserted one or several 
copies of a polynucleotide encoding a native BAP28 protein, or alternatively a mutant BAP28 
protein. 

In a second preferred embodiment, these transgenic animals may express a desired 
1 5 polypeptide of interest under the control of the regulatory polynucleotides of the BAP 28 gene, 

leading to good yields in the synthesis of this protein of interest, and eventually a tissue specific 

expression of this protein of interest. 

The design of the transgenic animals of the invention may be made according to the 

conventional techniques well known from the one skilled in the art. For more details regarding the 
20 production of transgenic animals, and specifically transgenic mice, it may be referred to US Patents 

Nos 4,873,191, issued Oct. 10, 1989; 5,464,764 issued Nov 7, 1995; and 5,789,215, issued Aug 4, 

1998; these documents being herein incorporated by reference to disclose methods producing 

transgenic mice. 

Transgenic animals of the present invention are produced by the application of procedures 
25 which result in an animal with a genome that has incorporated exogenous genetic material. The 
procedure involves obtaining the genetic material, or a portion thereof, which encodes either a 
BAP28 coding sequence, a BAP28 regulatory polynucleotide or a DNA sequence encoding a BAP28 
antisense polynucleotide such as described in the present specification. 

A recombinant polynucleotide of the invention is inserted into an embryonic or ES stem 
30 cell line. The insertion is preferably made using electroporation, such as described by Thomas et 
al.(1987). The cells subjected to electroporation are screened (e.g. by selection via selectable 
markers, by PCR or by Southern blot analysis) to find positive cells which have integrated the 
exogenous recombinant polynucleotide into their genome, preferably via an homologous 
recombination event. An illustrative positive-negative selection procedure that may be used 
35 according to the invention is described by Mansour et al.(1988). 
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Then, the positive cells are isolated, cloned and injected into 3.5 days old blastocysts from 

mice, such as described by Bradley (1987). The blastocysts are then inserted into a female host 

animal and allowed to grow to term. 

Alternatively, the positive ES cells are brought into contact with embryos at the 2.5 days 
5 old 8-16 cell stage (morulae) such as described by Wood et al.(1993) or by Nagy et al.(1993), the ES 

cells being internalized to colonize extensively the blastocyst including the cells which will give 

rise to the germ line. 

The offspring of the female host are tested to determine which animals are transgenic e.g. 
include the inserted exogenous DNA sequence and which are wild-type. 
10 Thus, the present invention also concerns a transgenic animal containing a nucleic acid, a 

recombinant expression vector or a recombinant host cell according to the invention. 

Recombinant Cell Lines Derived From The Transgenic Animals Of The Invention. 

A further object of the invention consists of recombinant host cells obtained from a 
transgenic animal described herein. In one embodiment the invention encompasses cells derived 
1 5 from non-human host mammals and animals comprising a recombinant vector of the invention or a 
BAP28 gene disrupted by homologous recombination with a knock out vector. 

Recombinant cell lines may be established in vitro from cells obtained from any tissue of a 
transgenic animal according to the invention, for example by transfection of primary cell cultures 
with vectors expressing one-genes such as SV40 large T antigen, as described by Chou (1989) and 
20 Shay etal.( 1991). 

Methods for screening substances interacting with a BAP28 polypeptide 
For the purpose of the present invention, a Iigand means a molecule, such as a protein, a 
peptide, an antibody or any synthetic chemical compound capable of binding to the BAP28 protein 
or one of its fragments or variants or to modulate the expression of the polynucleotide coding for 
25 BAP28 or a fragment or variant thereof. 

In the ligand screening method according to the present invention, a biological sample or a 
defined molecule to be tested as a putative ligand of the BAP28 protein is brought into contact with 
the corresponding purified BAP28 protein, for example the corresponding purified recombinant 
BAP28 protein produced by a recombinant cell host as described hereinbefore, in order to form a 
30 complex between this protein and the putative ligand molecule to be tested. 

As an illustrative example, to study the interaction of the BAP28 protein, or a fragment 
comprising a contiguous span of at least 6 amino acids, preferably at least 8 to 1 0 amino acids, more 
preferably at least 12, 15, 20, 25, 30, 40, 50, or 100 amino acids of SEQ ID No 5, wherein said 
contiguous span includes either at least 1, 2, 3, 5 or 10 of the amino acid positions selected from the 
35 group consisting of 1 to 1629 of the SEQ ID No 5, or an amino acid selected from the group 

consisting of an asparagine at the amino acid position 1694 of SEQ ID No 5, a valine at the amino 
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acid position 1854 of SEQ ID No 5, an asparagine at the amino acid position 1967 of SEQ ID No 5, 
a glutamic acid at the amino acid position 2017 of SEQ ID No 5, and an alanine at the amino acid 
position 2050 of SEQ ID No 5, with drugs or small molecules, such as molecules generated through 
combinatorial chemistry approaches, the microdialysis coupled to HPLC method described by Wang 
5 et al. (1997) or the affinity capillary electrophoresis method described by Bush et al. (1997), the 
disclosures of which are incorporated by reference, can be used. 

In further methods, peptides, drugs, fatty acids, lipoproteins, or small molecules which 
interact with the BAP28 protein, or a fragment comprising a contiguous span of at least 6 amino 
acids, preferably at least 8 to 10 amino acids, more preferably at least 12, 15, 20, 25, 30, 40, 50, or 

10 100 amino acids of SEQ ID No 5, wherein said contiguous span includes either at least 1, 2, 3, 5 or 
1 0 of the amino acid positions selected from the group consisting of 1 to 1 629 of the SEQ ID No 5 
or an amino acid selected from the group consisting of an asparagine at the amino acid position 1694 
of SEQ ID No 5, a valine at the amino acid position 1854 of SEQ ID No 5, an asparagine at the 
amino acid position 1967 of SEQ ID No 5, a glutamic acid at the amino acid position 2017 of SEQ 

15 ID No 5, and an alanine at the amino acid position 2050 of SEQ ID No 5, may be identified using 
assays such as the following. The molecule to be tested for binding is labeled with a detectable 
label, such as a fluorescent , radioactive, or enzymatic tag and placed in contact with immobilized 
BAP28 protein, or a fragment thereof under conditions which permit specific binding to occur. 
After removal of non-specifically bound molecules, bound molecules are detected using appropriate 

20 means. 

Another object of the present invention consists of methods and kits for the screening of 
candidate substances that interact with BAP28 polypeptide. 

The present invention pertains to methods for screening substances of interest that interact 
with a BAP28 protein or one fragment or variant thereof. By their capacity to bind covalently or 
25 non- covalently to a BAP28 protein or to a fragment or variant thereof, these substances or molecules 
may be advantageously used both in vitro and in vivo. 

In vitro, said interacting molecules may be used as detection means in order to identify the 
presence of a BAP28 protein in a sample, preferably a biological sample. 

A method for the screening of a candidate substance comprises the following steps : 
30 a) providing a polypeptide consisting of a BAP28 protein or a fragment comprising a 

contiguous span of at least 6 amino acids, preferably at least 8 to 10 amino acids, more preferably at 
least 12, 15, 20, 25, 30, 40, 50, or 100 amino acids of SEQ ID No 5, wherein said contiguous span 
includes either at least 1, 2, 3, 5 or 10 of the amino acid positions selected from the group consisting 
of 1 to 1629 of the SEQ ID No 5 or an amino acid selected from the group consisting of an 
35 asparagine at the amino acid position 1694 of SEQ ID No 5, a valine at the amino acid position 1854 
of SEQ ID No 5, an asparagine at the amino acid position 1967 of SEQ ID No 5, a glutamic acid at 
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the amino acid position 2017 of SEQ ID No 5, and an alanine at the amino acid position 2050 of 
SEQ ID No 5, or a variant thereof; 

b) obtaining a candidate substance; 

c) bringing into contact said polypeptide with said candidate substance; 

5 d) detecting the complexes formed between said polypeptide and said candidate substance. 

The invention further concerns a kit for the screening of a candidate substance interacting with the 
BAP28 polypeptide, wherein said kit comprises : 

a) a BAP28 protein having an amino acid sequence selected from the group consisting of 
the amino acid sequences of SEQ ID No 5 or a peptide fragment comprising a contiguous span of 

10 at least 6 amino acids, preferably at least 8 to 10 amino acids, more preferably at least 12, 15, 20, 
25, 30, 40, 50, or 100 amino acids of SEQ ID No 5, wherein said contiguous span includes either at 
least 1, 2, 3, 5 or 10 of the amino acid positions selected from the group consisting of 1 to 1629 of 
the SEQ ID No 5 or an amino acid selected from the group consisting of an asparagine at the 
amino acid position 1694 of SEQ ID No 5, a valine at the amino acid position 1 854 of SEQ ID No 

15 5, an asparagine at the amino acid position 1967 of SEQ ID No 5, a glutamic acid at the amino acid 
position 2017 of SEQ ID No 5, and an alanine at the amino acid position 2050 of SEQ ID No 5, or 
a variant thereof ; 

b) in some embodiments, the kit may also comprise means useful to detect the complex 
formed between the BAP28 protein or a peptide fragment or a variant thereof and the candidate 

20 substance. 

In a preferred embodiment of the kit described above, the detection means consist in 
monoclonal or polyclonal antibodies directed against the BAP28 protein or a peptide fragment or a 
variant thereof. 

Various candidate substances or molecules can be assayed for interaction with a BAP28 
25 polypeptide. These substances or molecules include, without being limited to, natural or synthetic 
organic compounds or molecules of biological origin such as polypeptides. When the candidate 
substance or molecule consists of a polypeptide, this polypeptide may be the resulting expression 
product of a phage clone belonging to a phage-based random peptide library, or alternatively the 
polypeptide may be the resulting expression product of a cDNA library cloned in a vector suitable 
30 for performing a two-hybrid screening assay. 

The invention also pertains to kits useful for performing the hereinbefore described 
screening method. Preferably, such kits comprise a BAP28 polypeptide or a fragment or a variant 
thereof, and, in some embodiments, means useful to detect the complex formed between the BAP28 
polypeptide or its fragment or variant and the candidate substance. In a preferred embodiment the 
35 detection means consist in monoclonal or polyclonal antibodies directed against the corresponding 
BAP28 polypeptide or a fragment or a variant thereof. 
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A. Candidate ligands obtained from random peptide libraries 

In a particular embodiment of the screening method, the putative ligand is the expression 
product of a DNA insert contained in a phage vector (Parmley and Smith, 1988). Specifically, 
random peptide phages libraries are used. The random DNA inserts encode for peptides of 8 to 20 
5 amino acids in length (Oldenburg K.R. et al., 1992; Valadon P., et al., 1996; Lucas A.H., 1994; 
Westerink M.A.J., 1995; Felici F. et al., 1991). According to this particular embodiment, the 
recombinant phages expressing a protein that binds to the immobilized BAP28 protein is retained 
and the complex formed between the BAP28 protein and the recombinant phage may be 
subsequently immunoprecipitated by a polyclonal or a monoclonal antibody directed against the 
10 BAP28 protein. 

Once the ligand library in recombinant phages has been constructed, the phage population 
is brought into contact with the immobilized BAP28 protein. Then the preparation of complexes is 
washed in order to remove the non- specifically bound recombinant phages. The phages that bind 
specifically to the BAP28 protein are then eluted by a buffer (acid pH) or immunoprecipitated by the 

1 5 monoclonal antibody produced by the hybridoma anti-BAP28, and this phage population is 

subsequently amplified by an over- infection of bacteria (for example E. coli). The selection step 
may be repeated several times, preferably 2-4 times, in order to select the more specific recombinant 
phage clones. The last step consists in characterizing the peptide produced by the selected 
recombinant phage clones either by expression in infected bacteria and isolation, expressing the 

20 phage insert in another host-vector system, or sequencing the insert contained in the selected 
recombinant phages. 

B. Candidate ligands obtained by competition experiments. 

Alternatively, peptides, drugs or small molecules which bind to the BAP28 protein, or a 
fragment comprising a contiguous span of at least 6 amino acids, preferably at least 8 to 10 amino 

25 acids, more preferably at least 12, 15, 20, 25, 30, 40, 50, or 100 amino acids of SEQ ID No 5, 
wherein said contiguous span includes either at least 1, 2, 3, 5 or 10 of the amino acid positions 
selected from the group consisting of 1 to 1629 of the SEQ ID No 5 or an amino acid selected from 
the group consisting of an asparagine at the amino acid position 1694 of SEQ ID No 5, a valine at 
the amino acid position 1854 of SEQ ID No 5, an asparagine at the amino acid position 1967 of SEQ 

30 ID No 5, a glutamic acid at the amino acid position 2017 of SEQ ID No 5, and an alanine at the 
amino acid position 2050 of SEQ ID No 5, may be identified in competition experiments. In such 
assays, the BAP28 protein, or a fragment thereof, is immobilized to a surface, such as a plastic plate. 
Increasing amounts of the peptides, drugs or small molecules are placed in contact with the 
immobilized BAP28 protein, or a fragment thereof, in the presence of a detectable labeled known 

35 BAP28 protein ligand. For example, the BAP28 ligand may be detectably labeled with a 
fluorescent, radioactive, or enzymatic tag. The ability of the test molecule to bind the BAP28 
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protein, or a fragment thereof, is determined by measuring the amount of detectably labeled known 
ligand bound in the presence of the test molecule. A decrease in the amount of known ligand bound 
to the BAP28 protein, or a fragment thereof, when the test molecule is present indicated that the test 
molecule is able to bind to the BAP28 protein, or a fragment thereof. 

5 C. Candidate ligands obtained by affinity chromatography. 

Proteins or other molecules interacting with the BAP28 protein, or a fragment comprising 
a contiguous span of at least 6 amino acids, preferably at least 8 to 1 0 amino acids, more preferably 
at least 12, 15, 20, 25, 30, 40, 50, or 100 amino acids of SEQ ID No 5, wherein said contiguous span 
includes either at least 1, 2, 3, 5 or 10 of the amino acid positions selected from the group consisting 

10 of 1 to 1629 of the SEQ ID No 5 or an amino acid selected from the group consisting of an 

asparagine at the amino acid position 1 694 of SEQ ID No 5 , a valine at the amino acid position 1 854 
of SEQ ID No 5, an asparagine at the amino acid position 1967 of SEQ ID No 5, a glutamic acid at 
the amino acid position 2017 of SEQ ID No 5, and an alanine at the amino acid position 2050 of 
SEQ ID No 5, can also be found using affinity columns which contain the BAP28 protein, or a 

15 fragment thereof. The BAP28 protein, or a fragment thereof, may be attached to the column using 
conventional techniques including chemical coupling to a suitable column matrix such as agarose, 
Affi Gel® , or other matrices familiar to those of skill in art. In some embodiments of this method, 
the affinity column contains chimeric proteins in which the BAP28 protein, or a fragment thereof, is 
fused to glutathion S transferase (GST). A mixture of cellular proteins or pool of expressed proteins 

20 as described above is applied to the affinity column. Proteins or other molecules interacting with the 
BAP28 protein, or a fragment thereof, attached to the column can then be isolated and analyzed on 
2-D electrophoresis gel as described in Ramunsen et al. (1997), the disclosure of which is 
incorporated by reference. Alternatively, the proteins retained on the affinity column can be purified 
by electrophoresis based methods and sequenced. The same method can be used to isolate 

25 antibodies, to screen phage display products, or to screen phage display human antibodies. 

D. Candidate ligands obtained by optical biosensor methods 

Proteins interacting with the BAP28 protein, or a fragment comprising a contiguous span 
of at least 6 amino acids, preferably at least 8 to 10 amino acids, more preferably at least 12, 15, 20, 
25, 30, 40, 50, or 100 amino acids of SEQ ID No 5, wherein said contiguous span includes either at 

30 least 1, 2, 3, 5 or 10 of the amino acid positions selected from the group consisting of 1 to 1629 of 
the SEQ ID No 5 or an amino acid selected from the group consisting of an asparagine at the amino 
acid position 1 694 of SEQ ID No 5, a valine at the amino acid position 1 854 of SEQ ID No 5, an 
asparagine at the amino acid position 1967 of SEQ ID No 5, a glutamic acid at the amino acid 
position 201 7 of SEQ ID No 5, and an alanine at the amino acid position 2050 of SEQ ID No 5, can 

35 also be screened by using an Optical Biosensor as described in Edwards and Leatherbarrow (1997) 
and also in Szabo et al. (1995), the disclosure of which is incorporated by reference. This technique 
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permits the detection of interactions between molecules in real time, without the need of labeled 
molecules. This technique is based on the surface plasmon resonance (SPR) phenomenon. Briefly, 
the candidate ligand molecule to be tested is attached to a surface (such as a carboxymethyl dextran 
matrix). A light beam is directed towards the side of the surface that does not contain the sample to 
5 be tested and is reflected by said surface. The SPR phenomenon causes a decrease in the intensity of 
the reflected light with a specific association of angle and wavelength. The binding of candidate 
ligand molecules cause a change in the refraction index on the surface, which change is detected as a 
change in the SPR signal. For screening of candidate ligand molecules or substances that are able to 
interact with the BAP28 protein, or a fragment thereof, the BAP28 protein, or a fragment thereof, is 

1 0 immobilized onto a surface. This surface consists of one side of a cell through which flows the 
candidate molecule to be assayed. The binding of the candidate molecule on the BAP28 protein, or 
a fragment thereof, is detected as a change of the SPR signal. The candidate molecules tested may 
be proteins, peptides, carbohydrates, lipids, or small molecules generated by combinatorial 
chemistry. This technique may also be performed by immobilizing eukaryotic or prokaryotic cells 

1 5 or lipid vesicles exhibiting an endogenous or a recombinantly expressed BAP28 protein at their 
surface. 

The main advantage of the method is that it allows the determination of the association rate 
between the BAP28 protein and molecules interacting with the BAP28 protein. It is thus possible to 
select specifically ligand molecules interacting with the BAP28 protein, or a fragment thereof, 
20 through strong or conversely weak association constants. 

E. Candidate ligands obtained through a two-hybrid screening assay. 

The yeast two-hybrid system is designed to study protein-protein interactions in vivo 
(Fields and Song, 1989), and relies upon the fusion of a bait protein to the DNA binding domain of 
the yeast Gal4 protein. This technique is also described in the US Patent No US 5,667,973 and the 
25 US Patent No 5,283,173 (Fields et al.) the technical teachings of both patents being herein 
incorporated by reference. 

The general procedure of library screening by the two-hybrid assay may be performed as 
described by Harper et al. (1993) or as described by Cho et al. (1998) or also Fromont-Racine et al. 
(1997). 

30 The bait protein or polypeptide consists of a BAP28 polypeptide or a fragment comprising 

a contiguous span of at least 6 amino acids, preferably at least 8 to 10 amino acids, more preferably 
at least 12, 15, 20, 25, 30, 40, 50, or 100 amino acids of SEQ ID No 5, wherein said contiguous span 
includes either at least 1, 2, 3, 5 or 10 of the amino acid positions selected from the group consisting 
of 1 to 1629 of the SEQ ID No 5 or an amino acid selected from the group consisting of an 

35 asparagine at the amino acid position 1694 of SEQ ID No 5, a valine at the amino acid position 1854 
of SEQ ID No 5, an asparagine at the amino acid position 1967 of SEQ ID No 5, a glutamic acid at 
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the amino acid position 2017 of SEQ ID No 5, and an alanine at the amino acid position 2050 of 
SEQ ID No 5, or a variant thereof. 

More precisely, the nucleotide sequence encoding the BAP28 polypeptide or a fragment or 
variant thereof is fused to a polynucleotide encoding the DNA binding domain of the GAL4 protein, 
5 the fused nucleotide sequence being inserted in a suitable expression vector, for example pAS2 or 
pM3. 

Then, a human cDNA library is constructed in a specially designed vector, such that the 
human cDNA insert is fused to a nucleotide sequence in the vector that encodes the transcriptional 
domain of the GAL4 protein. Preferably, the vector used is the pACT vector. The polypeptides 

1 0 encoded by the nucleotide inserts of the human cDNA library are termed "pray" polypeptides. 

A third vector contains a detectable marker gene, such as beta galactosidase gene or CAT 
gene that is placed under the control of a regulation sequence that is responsive to the binding of a 
complete Gal4 protein containing both the transcriptional activation domain and the DNA binding 
domain. For example, the vector pG5EC may be used. 

1 5 Two different yeast strains are also used. As an illustrative but non limiting example the 

two different yeast strains may be the followings : 

- Y190, the phenotype of which is {MAT a, Leu2-3, 112 urai-12, trpl-901, his3-D200, ade2-101, 
gal4Dgall80D URA3 GAL-LacZ, LYS GAL-HIS3, cyK); 

- Yl 87, the phenotype of which is (MATa gal4 gal80 his3 trpl-901 ade2-101 ura3-52 leu2-3, - 
20 112 URA3 GAL-lacZmef), which is the opposite mating type of Y 1 90 . 

Briefly, 20 u.g of pAS2/BAP28 and 20 ug of pACT-cDNA library are co-transformed into 
yeast strain Y190. The transformants are selected for growth on minimal media lacking histidine, 
leucine and tryptophan, but containing the histidine synthesis inhibitor 3-AT (50 mM). Positive 
colonies are screened for beta galactosidase by filter lift assay. The double positive colonies (His + , 

25 beta-gat) are then grown on plates lacking histidine, leucine, but containing tryptophan and 

cycloheximide (10 mg/ml) to select for loss of pAS2/BAP28 plasmids bu retention of pACT-cDNA 
library plasmids. The resulting Y190 strains are mated with Yl 87 strains expressing BAP28 or non- 
related control proteins; such as cyclophilin B, lamin, or SNF1, as Gal4 fusions as described by 
Harper et al. (1993) and by Bram et al. (Bram RJ et al., 1993), and screened for beta galactosidase 

30 by filter lift assay. Yeast clones that are beta gal- after mating with the control Gal4 fusions are 
considered false positives. 

In another embodiment of the two-hybrid method according to the invention, interaction 
between the BAP28 or a fragment or variant thereof with cellular proteins may be assessed using the 
Matchmaker Two Hybrid System 2 (Catalog No K 1604-1, Clontech). As described in the manual 

3 5 accompanying the Matchmaker Two Hybrid System 2 (Catalog No K 1 604- 1 , Clontech), the disclosure 
of which is incorporated herein by reference, nucleic acids encoding the BAP28 protein or a portion 
thereof, are inserted into an expression vector such that they are in frame with DNA encoding the DNA 
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binding domain of the yeast transcriptional activator GAL4. A desired cDNA, preferably human 
cDNA, is inserted into a second expression vector such that they are in frame with DNA encoding the 
activation domain of GAL4. The two expression plasmids are transformed into yeast and the yeast are 
plated on selection medium which selects for expression of selectable markers on each of the expression 
5 vectors as well as GAL4 dependent expression of the HIS3 gene. Transformants capable of growing on 
medium lacking histidine are screened for GAL4 dependent lacZ expression. Those cells which are 
positive in both the histidine selection and the lacZ assay contain interaction between BAP28 and the 
protein or peptide encoded by the initially selected cDNA insert. 

Methods For Screening Substances Modulating The Activity Of The BAP28 protein 

1 0 The invention also concerns a method for screening new agents, or candidate substances 

which modulate the activity of the BAP28 protein or a fragment thereof. Preferably, the BAP28 
protein or a fragment thereof is a polypeptide code comprising a contiguous span of at least 6 amino 
acids of SEQ ID No 5, wherein said contiguous span includes either at least 1, 2, 3, 5 or 10 of the 
amino acid positions selected from the group consisting of 1 to 1629 of the SEQ ID No 5 or an 

15 amino acid selected from the group consisting of an asparagine at the amino acid position 1694 of 
SEQ ID No 5, a valine at the amino acid position 1 854 of SEQ ID No 5, an asparagine at the amino 
acid position 1967 of SEQ ID No 5, a glutamic acid at the amino acid position 2017 of SEQ ID No 
5, and an alanine at the amino acid position 2050 of SEQ ID No 5. Preferably, the candidate 
substance is mixed with the BAP28 protein and the activity of the BAP28 protein is measured. 

20 Candidate substances include, without being limited to, natural or synthetic organic compounds or 
molecules of biological origin such as polypeptides. 

Method For Screening Substances Interacting With The Regulatory Sequences Of 
The BAP28 Gene 

The present invention also concerns a method for screening substances or molecules that 
25 are able to interact with the regulatory sequences of the BAP28 gene, such as for example promoter 
or enhancer sequences. 

Nucleic acids encoding proteins which are able to interact with the regulatory sequences 
of the BAP28 gene, more particularly a nucleotide sequence selected from the group consisting of 
the polynucleotides of the 5' and 3' regulatory region or a fragment or variant thereof, and 
30 preferably a variant comprising one of the biallelic markers of the invention, may be identified by 
using a one-hybrid system, such as that described in the booklet enclosed in the Matchmaker One- 
Hybrid System kit from Clontech (Catalog Ref. n° Kl 603-1), the technical teachings of which are 
herein incorporated by reference. Briefly, the target nucleotide sequence is cloned upstream of a 
selectable reporter sequence and the resulting DNA construct is integrated in the yeast genome 
3 5 (Saccharomyces cerevisiae). The yeast cells containing the reporter sequence in their genome are 
then transformed with a library consisting of fusion molecules between cDNAs encoding candidate 
proteins for binding onto the regulatory sequences of the BAP28 gene and sequences encoding the 
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activator domain of a yeast transcription factor such as GAL4. The recombinant yeast cells are 
plated in a culture broth for selecting cells expressing the reporter sequence. The recombinant yeast 
cells thus selected contain a fusion protein that is able to bind onto the target regulatory sequence of 
the BAP28 gene. Then, the cDNAs encoding the fusion proteins are sequenced and may be cloned 
5 into expression or transcription vectors in vitro. The binding of the encoded polypeptides to the 
target regulatory sequences of the BAP28 gene may be confirmed by techniques familiar to the one 
skilled in the art, such as gel retardation assays or DNAse protection assays. 

Gel retardation assays may also be performed independently in order to screen candidate 
molecules that are able to interact with the regulatory sequences of the BAP28 gene, such as 

10 described by Fried and Crothers (1981), Garner and Revzin (1981) and Dent and Latchman (1993), 
the teachings of these publications being herein incorporated by reference. These techniques are 
based on the principle according to which a DNA fragment which is bound to a protein migrates 
slower than the same unbound DNA fragment. Briefly, the target nucleotide sequence is labeled. 
Then the labeled target nucleotide sequence is brought into contact with either a total nuclear extract 

15 from cells containing transcription factors, or with different candidate molecules to be tested. The 
interaction between the target regulatory sequence of the BAP28 gene and the candidate molecule or 
the transcription factor is detected after gel or capillary electrophoresis through a retardation in the 
migration. 

Method For Screening Ligands That Modulate The Expression Of The BAP28 
20 Protein 

Another subject of the present invention is a method for screening molecules that modulate 
the expression of the BAP28 protein. Such a screening method comprises the steps of: 

a) cultivating a prokaryotic or an eukaryotic cell that has been transfected with a nucleotide 
sequence encoding the BAP28 protein or a variant or a fragment thereof, placed under the control of 

25 its own promoter; 

b) bringing into contact the cultivated cell with a molecule to be tested; 

c) quantifying the expression of the BAP28 protein or a variant or a fragment thereof. 
Using DNA recombination techniques well known by the one skill in the art, the BAP28 

protein encoding DNA sequence is inserted into an expression vector, downstream from its promoter 
30 sequence. As an illustrative example, the promoter sequence of the BAP28 gene is contained in the 

nucleic acid of the 5' regulatory region. 

The quantification of the expression of the BAP28 protein may be realized either at the 

mRNA level or at the protein level. In the latter case, polyclonal or monoclonal antibodies may be 

used to quantify the amounts of the BAP28 protein that have been produced, for example in an 
35 ELISA or a PJA assay. 
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In a preferred embodiment, the quantification of the BAP28 mRNA is realized by a 
quantitative PCR amplification of the cDNA obtained by a reverse transcription of the total mRNA 
of the cultivated BAP28 -transfected host cell, using a pair of primers specific for BAP28. 

The present invention also concerns a method for screening substances or molecules that 
5 are able to increase, or in contrast to decrease, the level of expression of the BAP28 gene. Such a 
method may allow the one skilled in the art to select substances exerting a regulating effect on the 
expression level of the BAP28 gene and which may be useful as active ingredients included in 
pharmaceutical compositions for treating patients suffering from prostate cancer. 

Thus, is also part of the present invention a method for screening of a candidate substance 
1 0 or molecule that modulated the expression of the BAP28 gene, this method comprises the following 
steps: 

- providing a recombinant cell host containing a nucleic acid, wherein said nucleic acid 
comprises a nucleotide sequence of the 5' regulatory region or a biologically active fragment or 
variant thereof located upstream a polynucleotide encoding a detectable protein; 

1 5 - obtaining a candidate substance; and 

- determining the ability of the candidate substance to modulate the expression levels of 
the polynucleotide encoding the detectable protein. 

In a further embodiment, the nucleic acid comprising the nucleotide sequence of the 5' 
regulatory region or a biologically active fragment or variant thereof also includes a 5'UTR region 
20 of the BAP28 cDNA of SEQ ID No 2 or 3, or one of its biologically active fragments or variants 
thereof. 

Among the preferred polynucleotides encoding a detectable protein, there may be cited 
polynucleotides encoding beta galactosidase, green fluorescent protein (GFP) and chloramphenicol 
acetyl transferase (CAT). In some embodiments, the detectable protein can be BAP28 or a fragment 
25 thereof. 

The invention also pertains to kits useful for performing the hereinbefore described 
screening method. Preferably, such kits comprise a recombinant vector that allows the expression of 
a nucleotide sequence of the 5' regulatory region or a biologically active fragment or variant thereof 
located upstream and operably linked to a polynucleotide encoding a detectable protein or the 
30 BAP28 protein or a fragment or a variant thereof. 

In another embodiment of a method for the screening of a candidate substance or molecule 
that modulates the expression of the BAP28 gene, wherein said method comprises the following 
steps: 

a) providing a recombinant host cell containing a nucleic acid, wherein said nucleic acid 
35 comprises a 5'UTR sequence of the BAP28 cDNA of SEQ ID No 2 or 3, or one of its biologically 
active fragments or variants, the 5'UTR sequence or its biologically active fragment or variant being 
operably linked to a polynucleotide encoding a detectable protein; 
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b) obtaining a candidate substance; and 

c) determining the ability of the candidate substance to modulate the expression levels of the 
polynucleotide encoding the detectable protein. 

In a specific embodiment of the above screening method, the nucleic acid that comprises a 
5 nucleotide sequence selected from the group consisting of the 5'UTR sequence of the BAP28 cDNA 
of SEQ ID No 2 or 3 or one of its biologically active fragments or variants, includes a promoter 
sequence which is endogenous with respect to the BAP28 5'UTR sequence. 

In another specific embodiment of the above screening method, the nucleic acid that 
comprises a nucleotide sequence selected from the group consisting of the 5'UTR sequence of the 

10 BAP28 cDNA of SEQ ID No 2 or 3 or one of its biologically active fragments or variants, includes a 
promoter sequence which is exogenous with respect to the BAP28 5'UTR sequence defined therein. 

In a further preferred embodiment, the nucleic acid comprising the 5'-UTR sequence of the 
BAP28 cDNA or SEQ ID No 2 or 3 or the biologically active fragments thereof includes a biallelic 
marker selected from the group consisting of Al to A58, preferably Al to A27, A3 4, A3 7 to A41, 

15 A43 to A49, A52, and A54 to A58, more preferably one of the biallelic markers Al, A4, 16, A30, 
A3 1, A42, A50, A5 1 , and A53, or the complements thereof. 

The invention further deals with a kit for the screening of a candidate substance 
modulating the expression of the BAP28 gene, wherein said kit comprises a recombinant vector that 
comprises a nucleic acid including a 5'UTR sequence of the BAP 28 cDNA of SEQ ID No 2 or 3, or 

20 one of their biologically active fragments or variants, the 5'UTR sequence or its biologically active 
fragment or variant being operably linked to a polynucleotide encoding a detectable protein. 

For the design of suitable recombinant vectors useful for performing the screening 
methods described above, it will be referred to the section of the present specification wherein the 
preferred recombinant vectors of the invention are detailed. 

25 Expression levels and patterns of BAP 28 may be analyzed by solution hybridization with 

long probes as described in International Patent Application No WO 97/05277, the entire contents of 
which are incorporated herein by reference. Briefly, the BAP28 cDNA or the BAP28 genomic DNA 
described above, or fragments thereof, is inserted at a cloning site immediately downstream of a 
bacteriophage (T3, T7 or SP6) RNA polymerase promoter to produce antisense RNA. Preferably, 

30 the BAP28 insert comprises at least 100 or more consecutive nucleotides of the genomic DNA 
sequence or the cDNA sequences. The plasmid is linearized and transcribed in the presence of 
ribonucleotides comprising modified ribonucleotides (i.e. biotin-UTP and DIG-UTP). An excess of 
this doubly labeled RNA is hybridized in solution with mRNA isolated from cells or tissues of 
interest. The hybridizations are performed under standard stringent conditions (40-5 0°C for 16 

35 hours in an 80% formamide, 0. 4 M NaCl buffer, pH 7-8). The unhybridized probe is removed by 
digestion with ribonucleases specific for single-stranded RNA (i.e. RNases CL3, Tl, Phy M, U2 or 
A). The presence of the biotin-UTP modification enables capture of the hybrid on a microtitration 
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plate coated with streptavidin. The presence of the DIG modification enables the hybrid to be 
detected and quantified by ELISA using an anti-DIG antibody coupled to alkaline phosphatase. 

Quantitative analysis of BAP 2 8 gene expression may also be performed using arrays. As 
used herein, the term array means a one dimensional, two dimensional, or multidimensional 
5 arrangement of a plurality of nucleic acids of sufficient length to permit specific detection of 
expression of mRNAs capable of hybridizing thereto. For example, the arrays may contain a 
plurality of nucleic acids derived from genes whose expression levels are to be assessed. The arrays 
may include the BAP28 genomic DNA, the BAP28 cDNA sequences or the sequences 
complementary thereto or fragments thereof, particularly those comprising at least one of the 

10 biallelic markers according the present invention, preferably at least one of the biallelic markers Al 
to A58, preferably Al to A27, A34, A37 to A41, A43 to A49, A52, and A54 to A58, more 
preferably at least one of the biallelic markers Al, A4, 16, A30, A31, A42, A50, A51, and A53. 
Preferably, the fragments are at least 15 nucleotides in length. In other embodiments, the fragments 
are at least 25 nucleotides in length. In some embodiments, the fragments are at least 50 nucleotides 

15 in length. More preferably, the fragments are at least 100 nucleotides in length. In another preferred 
embodiment, the fragments are more than 1 00 nucleotides in length. In some embodiments the 
fragments may be more than 500 nucleotides in length. 

For example, quantitative analysis of BAP28 gene expression may be performed with a 
complementary DNA microarray as described by Schena et al.(1995 and 1996). Full length BAP28 

20 cDNAs or fragments thereof are amplified by PCR and arrayed from a 96-well microtiter plate onto 
silylated microscope slides using high-speed robotics. Printed arrays are incubated in a humid 
chamber to allow rehydration of the array elements and rinsed, once in 0.2% SDS for 1 min, twice in 
water for 1 min and once for 5 min in sodium borohydride solution. The arrays are submerged in 
water for 2 min at 95°C, transferred into 0.2% SDS for 1 min, rinsed twice with water, air dried and 

25 stored in the dark at 25°C. 

Cell or tissue mRNA is isolated or commercially obtained and probes are prepared by a 
single round of reverse transcription. Probes are hybridized to 1 cm 2 microarrays under a 14 x 14 
mm glass coverslip for 6-12 hours at 60°C. Arrays are washed for 5 min at 25°C in low stringency 
wash buffer (1 x SSC/0.2% SDS), then for 10 min at room temperature in high stringency wash 

30 buffer (0.1 x SSC/0.2% SDS). Arrays are scanned in 0.1 x SSC using a fluorescence laser scanning 
device fitted with a custom filter set. Accurate differential expression measurements are obtained by 
taking the average of the ratios of two independent hybridizations. 

Quantitative analysis of BAP28 gene expression may also be performed with full length 
BAP28 cDNAs or fragments thereof in complementary DNA arrays as described by Pietu et 

35 al.(1996). The full length BAP28 cDNA or fragments thereof is PCR amplified and spotted on 
membranes. Then, mRNAs originating from various tissues or cells are labeled with radioactive 
nucleotides. After hybridization and washing in controlled conditions, the hybridized mRNAs are 
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detected by phospho-imaging or autoradiography. Duplicate experiments are performed and a 
quantitative analysis of differentially expressed mRNAs is then performed. 

Alternatively, expression analysis using the BAP28 genomic DNA, the BAP28 cDNA, or 
fragments thereof can be done through high density nucleotide arrays as described by Lockhart et 
5 al.(1996) and Sosnowsky et al.(1997). Oligonucleotides of 15-50 nucleotides from the sequences of 
the BAP28 genomic DNA, the BAP28 cDNA sequences particularly those comprising at least one of 
biallelic markers according the present invention, preferably at least one biallelic marker selected 
from the group consisting of Al to A58, preferably Al to A27, A34, A37 to A41, A43 to A49, A52, 
and A54 to A58, more preferably at least one of the biallelic markers Al, A4, 16, A30, A3 1, A42, 

10 A50, A5 1 , and A53, or the sequences complementary thereto, are synthesized directly on the chip 
(Lockhart et al, supra) or synthesized and then addressed to the chip (Sosnowski et al., supra). 
Preferably, the oligonucleotides are about 20 nucleotides in length. 

BAP28 cDNA probes labeled with an appropriate compound, such as biotin, digoxigenin 
or fluorescent dye, are synthesized from the appropriate mRNA population and then randomly 

15 fragmented to an average size of 50 to 100 nucleotides. The said probes are then hybridized to the 
chip. After washing as described in Lockhart et al., supra and application of different electric fields 
(Sosnowsky et al., 1997), the dyes or labeling compounds are detected and quantified. Duplicate 
hybridizations are performed. Comparative analysis of the intensity of the signal originating from 
cDNA probes on the same target oligonucleotide in different cDNA samples indicates a differential 

20 expression of BAP28 mRNA. 

Computer-Related Embodiments 

As used herein the term "nucleic acid codes of the invention" encompass the nucleotide 
sequences comprising, consisting essentially of, or consisting of any one of the following: 

a) a contiguous span of at least 12, 15, 18, 20, 25, 30, 35, 40, 50, 60, 70, 80, 90, 100, 150, 
25 200, 500, or 1000 nucleotides of SEQ ID No 1, wherein said contiguous span comprises at least 1, 2, 

3, 5, or 10 of the following nucleotide positions of SEQ ID No 1: 1-50357, 50499-50963, 51257- 
52147, 52299-53234, 53394-53553, 53689-53837, 53943-54028, 54198-54740, 54896-55753, 
55913-57385, 57495-58503, 58828-85946, 59355-85946, 86169-91228, and/or 91852 to 97662; 

b) a contiguous span of at least 12, 15, 18, 20, 25, 30, 50, 80, 100, 150, 200, 250, 300, 350, 
30 400, 450, or 500 nucleotides of SEQ ID No 1 or the complement thereof, wherein said contiguous 

span comprises at least 1, 2, 3, 5, 10, 20, 30, 40 or 50 nucleotides selected from the group consisting 
of the following nucleotide positions of SEQ ID No 1: 4997-5076, 5371-5544, 6121-6337, 9877- 
10018, 11522-11623, 12521-12661, 13453-13664, 13824-13957, 15376-15478, 16855-16965, 
17378-17495, 18535-18642, 21446-21541, 21999-22087, 23036-23247, 23546-23667, 24270- 
35 24461, 26287-26470, 2661 1-26747, 28068-28260, 32540-32709, 331 12-33270, 34586-34828, 
35156-35287, 36660-36763, 36934-37077, 37803-37921, 38017-38138, 40365-40493, 42618- 
42848, 43452-43578, 44836-44999, 48223-48269, and 49656-49779; 
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c) a contiguous span of at least 12, 15, 18, 20, 25, 30, 35, 40, 50, 60, 70, 80, 90, 1 00, 150, 
200, 500, or 1000 nucleotides of SEQ ID No 1 or the complements thereof, wherein said contiguous 
span comprises at least one BAP28-related biallelic marker selected from the group consisting of Al 
to A58, preferably Al to A27, A34, A37 to A41, A43 to A49, A52, and A54 to A58, more 

5 preferably one of the biallelic markers A 1 , A4, 1 6, A30, A3 1 , A42, A50, A5 1 , and A53 ; 

d) a contiguous span of at least 12, 15, 18, 20, 25, 30, 35, 40, 50, 60, 70, 80, 90, 100, 150, 
200, 500, or 1000 nucleotides of a nucleic acid sequence selected from the group consisting of SEQ 
ID Nos 2 and 3 or the complements thereof, wherein said contiguous span comprises at least 1, 2, 3, 
5, or 10 of nucleotide positions 1 to 4995 of SEQ ID No 2 or 3; 

10 e) a contiguous span of at least 12, 15, 18, 20, 25, 30, 35, 40, 50, 60, 70, 80, 90, 100, 150, 

200, 500, or 1000 nucleotides of a nucleic acid sequence selected from the group consisting of SEQ 
ID Nos 2 and 3 or the complements thereof, wherein said contiguous span comprises at least 1, 2, 3, 
5, or 10 of nucleotide positions 1 to 2033, 2160 to 2348 and 2676 to 4995 of SEQ ID No 2 or 3; 

f) a contiguous span of at least 12, 15, 18, 20, 25, 30, 35, 40, 50, 60, 70, 80, 90, 100, 150, 

15 200, 500, or 1000 nucleotides of a nucleic acid sequence selected from the group consisting of SEQ 
ID Nos 1-3 or the complements thereof, wherein said contiguous span comprises at least 1, 2, 3, 5, 
or 10 of any one of the following ranges of nucleotide positions of: 

(1) SEQ ID No 1: 1-2500, 2501-5000, 5001-7500, 7501-10000, 10001-12500, 12501- 
15000, 15001-17500, 17501-20000, 20001-22500, 22501-25000, 25001-27500, 27501-30000, 

20 30001-32500, 32501-35000, 35001-37500, 37501-40000, 40001-42500, 42501-45000, 45001- 
47500, 47501-50000, 50001-50357, 50499-50963, 51257-52147, 52299-53234, 53394-53553, 
53689-53837, 53943-54028, 54198-54740, 54896-55753, 55913-57385, 57495-58503, 58828- 
85946, 59355-85946, 86169-91228, and/or 91852 to 97662; 

(2) SEQ ID No 2: 1 to 500, 501 to 1000, 1001 to 1500, 1501 to 2000, 2001 to 2500, 2501 
25 to 3000, 3001 to 3500, 3501 to 4000, 4001 to 4500, 4501 to 4995, 5000 to 5500, 5501 to 6000, 6001 

to 6500, and 6501 to 6782; and, 

(3) SEQ ID No 3: 1 to 500, 501 to 1000, 1001 to 1500, 1501 to 2000, 2001 to 2500, 2501 
to 3000, 3001 to 3500, 3501 to 4000, 4001 to 4500, 4501 to 4995, 5000 to 5500, 5501 to 6000, 6001 
to 6500, 6501 to 7000, 7001 to 7500, 7501 to 7932; and 

30 g) a nucleotide sequence selected from the group consisting of SEQ ID Nos 4, and 9-13; 

and, 

h) a nucleotide sequence complementary to any one of the preceding nucleotide sequences. 
The "nucleic acid codes of the invention" further encompass nucleotide sequences 
homologous to: 

35 a) a contiguous span of at least 12, 15, 18, 20, 25, 30, 35, 40, 50, 60, 70, 80, 90, 100, 150, 

200, 500, or 1000 nucleotides of SEQ ID No 1, wherein said contiguous span comprises at least 1, 2, 
3, 5, or 10 of the following nucleotide positions of SEQ ID No 1: 1-50357, 50499-50963, 51257- 
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52147, 52299-53234, 53394-53553, 53689-53837, 53943-54028, 54198-54740, 54896-55753, 
55913-57385, 57495-58503, 58828-85946, 59355-85946, 86169-91228, and/or 91852 to 97662; 

b) a contiguous span of at least 12, 15, 18, 20, 25, 30, 50, 80, 100, 150, 200, 250, 300, 350, 
400, 450, or 500 nucleotides of SEQ ID No 1 or the complement thereof, wherein said contiguous 

5 span comprises at least 1, 2, 3, 5, 10, 20, 30, 40 or 50 nucleotides selected from the group consisting 
of the following nucleotide positions of SEQ ID No 1: 4997-5076, 5371-5544, 6121-6337, 9877- 
10018, 11522-11623, 12521-12661, 13453-13664, 13824-13957, 15376-15478, 16855-16965, 
17378-17495, 18535-18642, 21446-21541, 21999-22087, 23036-23247, 23546-23667, 24270- 
24461, 26287-26470, 2661 1-26747, 28068-28260, 32540-32709, 331 12-33270, 34586-34828, 

10 35156-35287, 36660-36763, 36934-37077, 37803-37921, 38017-38138, 40365-40493, 42618- 
42848, 43452-43578, 44836-44999, 48223-48269, and 49656-49779; 

c) a contiguous span of at least 12, 15, 18, 20, 25, 30, 35, 40, 50, 60, 70, 80, 90, 100, 150, 
200, 500, or 1000 nucleotides of SEQ ID No 1 or the complements thereof, wherein said contiguous 
span comprises at least one BAP28-related biallelic marker selected from the group consisting of Al 

1 5 to A58, preferably Al to A27, A34, A37 to A41 , A43 to A49, A52, and A54 to A58, more 
preferably one of the biallelic markers Al, A4, 16, A30, A31, A42, A50, A51, and A53; 

d) a contiguous span of at least 12, 15, 18, 20, 25, 30, 35, 40, 50, 60, 70, 80, 90, 100, 150, 
200, 500, or 1000 nucleotides of a nucleic acid sequence selected from the group consisting of SEQ 
ID Nos 2 and 3 or the complements thereof, wherein said contiguous span comprises at least 1, 2, 3, 

20 5, or 10 of nucleotide positions 1 to 4995 of SEQ ID No 2 or 3; 

e) a contiguous span of at least 12, 15, 18, 20, 25, 30, 35, 40, 50, 60, 70, 80, 90, 100, 150, 
200, 500, or 1000 nucleotides of a nucleic acid sequence selected from the group consisting of SEQ 
ID Nos 2 and 3 or the complements thereof, wherein said contiguous span comprises at least 1, 2, 3, 
5, or 10 of nucleotide positions 1 to 2033, 2160 to 2348 and 2676 to 4995 of SEQ ID No 2 or 3; 

25 f) a contiguous span of at least 12, 15, 18, 20, 25, 30, 35, 40, 50, 60, 70, 80, 90, 100, 150, 

200, 500, or 1000 nucleotides of a nucleic acid sequence selected from the group consisting of SEQ 
ID Nos 1-3 or the complements thereof, wherein said contiguous span comprises at least 1, 2, 3, 5, 
or 10 of any one of the following ranges of nucleotide positions of: 

(1) SEQ ID No 1: 1-2500, 2501-5000, 5001-7500, 7501-10000, 10001-12500, 12501- 

30 15000, 15001-17500, 17501-20000, 20001-22500, 22501-25000, 25001-27500, 27501-30000, 
30001-32500, 32501-35000, 35001-37500, 37501-40000, 40001-42500, 42501-45000, 45001- 
47500, 47501-50000, 50001-50357, 50499-50963, 51257-52147, 52299-53234, 53394-53553, 
53689-53837, 53943-54028, 54198-54740, 54896-55753, 55913-57385, 57495-58503, 58828- 
85946, 59355-85946, 86169-91228, and/or 91852 to 97662; 

35 (2) SEQ ID No 2: 1 to 500, 501 to 1000, 1001 to 1500, 1501 to 2000, 2001 to 2500, 2501 

to 3000, 3001 to 3500, 3501 to 4000, 4001 to 4500, 4501 to 4995, 5000 to 5500, 5501 to 6000, 6001 
to 6500, and 6501 to 6782; and, 
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(3) SEQ ID No 3: 1 to 500, 501 to 1000, 1001 to 1500, 1501 to 2000, 2001 to 2500, 2501 

to 3000, 3001 to 3500, 3501 to 4000, 4001 to 4500, 4501 to 4995, 5000 to 5500, 5501 to 6000, 6001 

to 6500, 6501 to 7000, 7001 to 7500, 7501 to 7932; and 

g) a nueclotide sequence selected from the group consisting of SEQ ID Nos 4, and 9-13; 

5 and, 

h) a nucleotide sequence complementary to any one of the preceding nucleotide sequences. 
Homologous sequences refer to a sequence having at least 99%, 98%, 97%, 96%, 95%, 90%, 
85%, 80%, or 75% homology to these contiguous spans. Homology may be determined using any 
method described herein, including BLAST2N with the default parameters or with any modified 

1 0 parameters. Homologous sequences also may include RNA sequences in which uridines replace the 
thymines in the nucleic acid codes of the invention. It will be appreciated that the nucleic acid codes of 
the invention can be represented in the traditional single character format (See the inside back cover of 
Stryer, Lubert. Biochemistry, 3 rd edition. W. H Freeman & Co., New York.) or in any other format or 
code which records the identity of the nucleotides in a sequence. 

1 5 As used herein the term "polypeptide codes of the invention" encompass the polypeptide 

sequences comprising a contiguous span of at least 6, 8, 10, 12, 15, 20, 25, 30, 40, 50, or 100 amino 
acids of SEQ ID No 5, wherein said contiguous span includes either at least 1, 2, 3, 5 or 10 of the 
amino acid positions selected from the group consisting of 1 to 1629 of the SEQ ID No 5 or an 
amino acid selected from the group consisting of an asparagine at the amino acid position 1694 of 

20 SEQ ID No 5, a valine at the amino acid position 1 854 of SEQ ID No 5, an asparagine at the amino 
acid position 1967 of SEQ ID No 5, a glutamic acid at the amino acid position 2017 of SEQ ID No 
5, and an alanine at the amino acid position 2050 of SEQ ID No 5 . It will be appreciated that the 
polypeptide codes of the invention can be represented in the traditional single character format or three 
letter format (See the inside back cover of Stryer, Lubert. Biochemistry, 3 rd edition. W. H Freeman & 

25 Co., New York.) or in any other format or code which records the identity of the polypeptides in a 
sequence. 

It will be appreciated by those skilled in the art that the nucleic acid codes of the invention and 
polypeptide codes of the invention can be stored, recorded, and manipulated on any medium which can 
be read and accessed by a computer. As used herein, the words "recorded" and "stored" refer to a 

30 process for storing information on a computer medium. A skilled artisan can readily adopt any of the 
presently known methods for recording information on a computer readable medium to generate 
manufactures comprising one or more of the nucleic acid codes of the invention, or one or more of the 
polypeptide codes of the invention. Another aspect of the present invention is a computer readable 
medium having recorded thereon at least 2, 5, 10, 15, 20, 25, 30, or 50 nucleic acid codes of the 

35 invention. Another aspect of the present invention is a computer readable medium having recorded 
thereon at least 2, 5, 10, 15, 20, 25, 30, or 50 polypeptide codes of the invention. 
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Computer readable media include magnetically readable media, optically readable media, 
electronically readable media and magnetic/optical media. For example, the computer readable media 
may be a hard disk, a floppy disk, a magnetic tape, CD-ROM, Digital Versatile Disk (DVD), Random 
Access Memory (RAM), or Read Only Memory (ROM) as well as other types of other media known to 
5 those skilled in the art. 

Embodiments of the present invention include systems, particularly computer systems which 
store and manipulate the sequence information described herein. One example of a computer system 
100 is illustrated in block diagram form in Figure 7. As used herein, "a computer system" refers to the 
hardware components, software components, and data storage components used to analyze the 
10 nucleotide sequences of the nucleic acid codes of the invention or the amino acid sequences of the 
polypeptide codes of the invention. In one embodiment, the computer system 100 is a Sun Enterprise 
1000 server (Sun Microsystems, Palo Alto, CA). The computer system 100 preferably includes a 
processor for processing, accessing and manipulating the sequence data. The processor 105 can be any 
well-known type of central processing unit, such as the Pentium III from Intel Corporation, or similar 
1 5 processor from Sun, Motorola, Compaq or International Business Machines. 

Preferably, the computer system 100 is a general purpose system that comprises the processor 
105 and one or more internal data storage components 1 10 for storing data, and one or more data 
retrieving devices for retrieving the data stored on the data storage components. A skilled artisan can 
readily appreciate that any one of the currently available computer systems are suitable. 
20 In one particular embodiment, the computer system 100 includes a processor 105 connected to 

a bus which is connected to a main memory 1 15 (preferably implemented as RAM) and one or more 
internal data storage devices 110, such as a hard drive and/or other computer readable media having 
data recorded thereon. In some embodiments, the computer system 100 further includes one or more 
data retrieving device 1 1 8 for reading the data stored on the internal data storage devices 110. 
25 The data retrieving device 1 1 8 may represent, for example, a floppy disk drive, a compact disk 

drive, a magnetic tape drive, etc. In some embodiments, the internal data storage device 1 10 is a 
removable computer readable medium such as a floppy disk, a compact disk, a magnetic tape, etc. 
containing control logic and/or data recorded thereon. The computer system 100 may advantageously 
include or be programmed by appropriate software for reading the control logic and/or the data from the 
30 data storage component once inserted in the data retrieving device. 

The computer system 100 includes a display 120 which is used to display output to a computer 
user. It should also be noted that the computer system 100 can be linked to other computer systems 
125a-c in a network or wide area network to provide centralized access to the computer system 100. 

Software for accessing and processing the nucleotide sequences of the nucleic acid codes of the 
35 invention or the amino acid sequences of the polypeptide codes of the invention (such as search tools, 
compare tools, and modeling tools etc.) may reside in main memory 1 15 during execution. 
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In some embodiments, the computer system 100 may further comprise a sequence comparer for 
comparing the above-described nucleic acid codes of the invention or the polypeptide codes of the 
invention stored on a computer readable medium to reference nucleotide or polypeptide sequences 
stored on a computer readable medium. A "sequence comparer" refers to one or more programs which 
5 are implemented on the computer system 1 00 to compare a nucleotide or polypeptide sequence with 
other nucleotide or polypeptide sequences and/or compounds including but not limited to peptides, 
peptidomimetics, and chemicals stored within the data storage means. For example, the sequence 
comparer may compare the nucleotide sequences of nucleic acid codes of the invention or the amino 
acid sequences of the polypeptide codes of the invention stored on a computer readable medium to 
1 0 reference sequences stored on a computer readable medium to identify homologies, motifs implicated in 
biological function, or structural motifs. The various sequence comparer programs identified elsewhere 
in this patent specification are particularly contemplated for use in this aspect of the invention. 

Figure 8 is a flow diagram illustrating one embodiment of a process 200 for comparing a new 
nucleotide or protein sequence with a database of sequences in order to determine the homology levels 
1 5 between the new sequence and the sequences in the database. The database of sequences can be a 
private database stored within the computer system 100, or a public database such as GENBANK, PIR 
OR SWISSPROT that is available through the Internet. 

The process 200 begins at a start state 201 and then moves to a state 202 wherein the new 
sequence to be compared is stored to a memory in a computer system 1 00. As discussed above, the 
20 memory could be any type of memory, including RAM or an internal storage device. 

The process 200 then moves to a state 204 wherein a database of sequences is opened for 
analysis and comparison. The process 200 then moves to a state 206 wherein the first sequence stored 
in the database is read into a memory on the computer. A comparison is then performed at a state 210 
to determine if the first sequence is the same as the second sequence. It is important to note that this 
25 step is not limited to performing an exact comparison between the new sequence and the first sequence 
in the database. Well-known methods are known to those of skill in the art for comparing two 
nucleotide or protein sequences, even if they are not identical. For example, gaps can be introduced into 
one sequence in order to raise the homology level between the two tested sequences. The parameters 
that control whether gaps or other features are introduced into a sequence during comparison are 
3 0 normally entered by the user of the computer system. 

Once a comparison of the two sequences has been performed at the state 210, a determination is 
made at a decision state 210 whether the two sequences are the same. Of course, the term "same" is not 
limited to sequences that are absolutely identical. Sequences that are within the homology parameters 
entered by the user will be marked as "same" in the process 200. 
35 If a determination is made that the two sequences are the same, the process 200 moves to a state 

214 wherein the name of the sequence from the database is displayed to the user. This state notifies the 
user that the sequence with the displayed name fulfills the homology constraints that were entered. 
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Once the name of the stored sequence is displayed to the user, the process 200 moves to a decision state 
2 1 8 wherein a determination is made whether more sequences exist in the database. If no more 
sequences exist in the database, then the process 200 terminates at an end state 220. However, if more 
sequences do exist in the database, then the process 200 moves to a state 224 wherein a pointer is 
5 moved to the next sequence in the database so that it can be compared to the new sequence. In this 
manner, the new sequence is aligned and compared with every sequence in the database. 

It should be noted that if a determination had been made at the decision state 2 12 that the 
sequences were not homologous, then the process 200 would move immediately to the decision state 
21 8 in order to determine if any other sequences were available in the database for comparison. 

1 0 Accordingly, one aspect of the present invention is a computer system comprising a 

processor, a data storage device having stored thereon a nucleic acid code of the invention or a 
polypeptide code of the invention, a data storage device having retrievably stored thereon reference 
nucleotide sequences or polypeptide sequences to be compared to the nucleic acid code of the 
invention or polypeptide code of the invention and a sequence comparer for conducting the 

1 5 comparison. The sequence comparer may indicate a homology level between the sequences 
compared or identify structural motifs in the nucleic acid code of the invention and polypeptide 
codes of the invention or it may identify structural motifs in sequences which are compared to these 
nucleic acid codes and polypeptide codes. In some embodiments, the data storage device may have 
stored thereon the sequences of at least 2, 5, 10, 15, 20, 25, 30, or 50 of the nucleic acid codes of the 

20 invention or polypeptide codes of the invention. 

Another aspect of the present invention is a method for determining the level of homology 
between a nucleic acid code of the invention and a reference nucleotide sequence, comprising the 
steps of reading the nucleic acid code and the reference nucleotide sequence through the use of a 
computer program which determines homology levels and determining homology between the nucleic 

25 acid code and the reference nucleotide sequence with the computer program. The computer program 
may be any of a number of computer programs for determining homology levels, including those 
specifically enumerated herein, including BLAST2N with the default parameters or with any modified 
parameters. The method may be implemented using the computer systems described above. The 
method may also be performed by reading 2, 5, 10, 15, 20, 25, 30, or 50 of the above described nucleic 

30 acid codes of the invention through the use of the computer program and determining homology 
between the nucleic acid codes and reference nucleotide sequences. 

Figure 9 is a flow diagram illustrating one embodiment of a process 250 in a computer for 
determining whether two sequences are homologous. The process 250 begins at a start state 252 and 
then moves to a state 254 wherein a first sequence to be compared is stored to a memory. The 

35 second sequence to be compared is then stored to a memory at a state 256. The process 250 then 
moves to a state 260 wherein the first character in the first sequence is read and then to a state 262 
wherein the first character of the second sequence is read. It should be understood that if the 
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sequence is a nucleotide sequence, then the character would normally be either A, T, C, G or U. If 
the sequence is a protein sequence, then it should be in the single letter amino acid code so that the 
first and sequence sequences can be easily compared. 

A determination is then made at a decision state 264 whether the two characters are the 
5 same. If they are the same, then the process 250 moves to a state 268 wherein the next characters in 
the first and second sequences are read. A determination is then made whether the next characters 
are the same. If they are, then the process 250 continues this loop until two characters are not the 
same. If a determination is made that the next two characters are not the same, the process 250 
moves to a decision state 274 to determine whether there are any more characters either sequence to 
10 read. 

If there aren't any more characters to read, then the process 250 moves to a state 276 
wherein the level of homology between the first and second sequences is displayed to the user. The 
level of homology is determined by calculating the proportion of characters between the sequences 
that were the same out of the total number of sequences in the first sequence. Thus, if every 

1 5 character in a first 100 nucleotide sequence aligned with a every character in a second sequence, the 
homology level would be 100%. 

Alternatively, the computer program may be a computer program which compares the 
nucleotide sequences of the nucleic acid codes of the present invention, to reference nucleotide 
sequences in order to determine whether the nucleic acid code of the invention differs from a reference 

20 nucleic acid sequence at one or more positions. In some embodiments, such a program records the 
length and identity of inserted, deleted or substituted nucleotides with respect to the sequence of either 
the reference polynucleotide or the nucleic acid code of the invention. In one embodiment, the 
computer program may be a program which determines whether the nucleotide sequences of the nucleic 
acid codes of the invention contain one or more single nucleotide polymorphisms (SNP) with respect to 

25 a reference nucleotide sequence. These single nucleotide polymorphisms may each comprise a single 
base substitution, insertion, or deletion. 

Another aspect of the present invention is a method for determining the level of homology 
between a polypeptide code of the invention and a reference polypeptide sequence, comprising the 
steps of reading the polypeptide code of the invention and the reference polypeptide sequence through 

3 0 use of a computer program which determines homology levels and determining homology between the 
polypeptide code and the reference polypeptide sequence using the computer program. 

Accordingly, another aspect of the present invention is a method for determining whether a 
nucleic acid code of the invention differs at one or more nucleotides from a reference nucleotide 
sequence comprising the steps of reading the nucleic acid code and the reference nucleotide sequence 

3 5 through use of a computer program which identifies differences between nucleic acid sequences and 
identifying differences between the nucleic acid code and the reference nucleotide sequence with the 
computer program. In some embodiments, the computer program is a program which identifies single 
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nucleotide polymorphisms The method may be implemented by the computer systems described above 
and the method illustrated in Figure 9. The method may also be performed by reading at least 2, 5, 10, 
15, 20, 25, 30, or 50 of the nucleic acid codes of the invention and the reference nucleotide sequences 
through the use of the computer program and identifying differences between the nucleic acid codes and 
5 the reference nucleotide sequences with the computer program. 

In other embodiments the computer based system may further comprise an identifier for 
identifying features within the nucleotide sequences of the nucleic acid codes of the invention or the 
amino acid sequences of the polypeptide codes of the invention. 

An "identifier" refers to one or more programs which identifies certain features within the 
10 above-described nucleotide sequences of the nucleic acid codes of the invention or the amino acid 
sequences of the polypeptide codes of the invention. In one embodiment, the identifier may 
comprise a program which identifies an open reading frame in the cDNAs codes of the invention. 

Figure 10 is a flow diagram illustrating one embodiment of an identifier process 300 for 
detecting the presence of a feature in a sequence. The process 300 begins at a start state 302 and 
15 then moves to a state 304 wherein a first sequence that is to be checked for features is stored to a 
memory 1 15 in the computer system 100. The process 300 then moves to a state 306 wherein a 
database of sequence features is opened. Such a database would include a list of each feature's 
attributes along with the name of the feature. For example, a feature name could be "Initiation 
Codon" and the attribute would be "ATG". Another example would be the feature name "TAATAA 
20 Box" and the feature attribute would be "TAATAA". An example of such a database is produced by 
the University of Wisconsin Genetics Computer Group (www.gcg.com). 

Once the database of features is opened at the state 306, the process 300 moves to a state 
308 wherein the first feature is read from the database. A comparison of the attribute of the first 
feature with the first sequence is then made at a state 310. A determination is then made at a 
25 decision state 316 whether the attribute of the feature was found in the first sequence. If the attribute 
was found, then the process 300 moves to a state 318 wherein the name of the found feature is 
displayed to the user. 

The process 300 then moves to a decision state 320 wherein a determination is made 
whether move features exist in the database. If no more features do exist, then the process 300 
30 terminates at an end state 324. However, if more features do exist in the database, then the process 
300 reads the next sequence feature at a state 326 and loops back to the state 310 wherein the 
attribute of the next feature is compared against the first sequence. 

It should be noted, that if the feature attribute is not found in the first sequence at the 
decision state 316, the process 300 moves directly to the decision state 320 in order to determine if 
35 any more features exist in the database. 

In another embodiment, the identifier may comprise a molecular modeling program which 
determines the 3-dimensional structure of the polypeptides codes of the invention. In some 
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embodiments, the molecular modeling program identifies target sequences that are most compatible 
with profiles representing the structural environments of the residues in known three-dimensional 
protein structures. (See, e.g., Eisenberg et al., U.S. Patent No 5,436,850 issued July 25, 1995). In 
another technique, the known three-dimensional structures of proteins in a given family are 

5 superimposed to define the structurally conserved regions in that family. This protein modeling 
technique also uses the known three-dimensional structure of a homologous protein to approximate 
the structure of the polypeptide codes of the invention. (See e.g., Srinivasan, et al., U.S. Patent 
No 5,557,535 issued September 17, 1996). Conventional homology modeling techniques have been 
used routinely to build models of proteases and antibodies. (Sowdhamini et al., Protein Engineering 

10 10:207, 215 (1997)). Comparative approaches can also be used to develop three-dimensional 

protein models when the protein of interest has poor sequence identity to template proteins. In some 
cases, proteins fold into similar three-dimensional structures despite having very weak sequence 
identities. For example, the three-dimensional structures of a number of helical cytokines fold in 
similar three-dimensional topology in spite of weak sequence homology. 

1 5 The recent development of threading methods now enables the identification of iikely 

folding patterns in a number of situations where the structural relatedness between target and 
template(s) is not detectable at the sequence level. Hybrid methods, in which fold recognition is 
performed using Multiple Sequence Threading (MST), structural equivalencies are deduced from the 
threading output using a distance geometry program DRAGON to construct a low resolution model, 

20 and a full-atom representation is constructed using a molecular modeling package such as 
QUANTA. 

According to this 3-step approach, candidate templates are first identified by using the novel 
fold recognition algorithm MST, which is capable of performing simultaneous threading of multiple 
aligned sequences onto one or more 3-D structures. In a second step, the structural equivalencies 

25 obtained from the MST output are converted into interresidue distance restraints and fed into the 
distance geometry program DRAGON, together with auxiliary information obtained from secondary 
structure predictions. The program combines the restraints in an unbiased manner and rapidly 
generates a large number of low resolution model confirmations. In a third step, these low 
resolution model confirmations are converted into full-atom models and subjected to energy 

30 minimization using the molecular modeling package QUANTA. (See e.g., Aszodi et al., 
Proteins: Structure, Function, and Genetics, Supplement 1:38-42 (1997)). 

The results of the molecular modeling analysis may then be used in rational drug design 
techniques to identify agents which modulate the activity of the polypeptide codes of the invention. 
Accordingly, another aspect of the present invention is a method of identifying a feature 

35 within the nucleic acid codes of the invention or the polypeptide codes of the invention comprising 
reading the nucleic acid code(s) or the polypeptide code(s) through the use of a computer program 
which identifies features therein and identifying features within the nucleic acid code(s) or 
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polypeptide code(s) with the computer program. In one embodiment, computer program comprises a 
computer program which identifies open reading frames. In a further embodiment, the computer 
program identifies structural motifs in a polypeptide sequence. In another embodiment, the 
computer program comprises a molecular modeling program. The method may be performed by 
5 reading a single sequence or at least 2, 5, 10, 1 5, 20, 25, 30, or 50 of the nucleic acid codes of the , 
invention or the polypeptide codes of the invention through the use of the computer program and 
identifying features within the nucleic acid codes or polypeptide codes with the computer program. 

The nucleic acid codes of the invention or the polypeptide codes of the invention maybe 
stored and manipulated in a variety of data processor programs in a variety of formats. For example, 

1 0 they may be stored as text in a word processing file, such as MicrosoftWORD or WORDPERFECT or 
as an ASCII file in a variety of database programs familiar to those of skill in the art, such as DB2, 
SYBASE, or ORACLE. In addition, many computer programs and databases may be used as sequence 
comparers, identifiers, or sources of reference nucleotide or polypeptide sequences to be compared to 
the nucleic acid codes of the invention or the polypeptide codes of the invention. The following list is 

1 5 intended not to limit the invention but to provide guidance to programs and databases which are useful 
with the nucleic acid codes of the invention or the polypeptide codes of the invention. The programs 
and databases which may be used include, but are not limited to: MacPattern (EMBL), DiscoveryBase 
(Molecular Applications Group), GeneMine (Molecular Applications Group), Look (Molecular 
Applications Group), MacLook (Molecular Applications Group), BLAST and BLAST2 (NCBI), 

20 BLASTN and BLASTX (Altschul et al, 1 990), FASTA (Pearson and Lipman, 1988), FASTDB 

(Brutlag et al., 1990), Catalyst (Molecular Simulations Inc.), Catalyst/SHAPE (Molecular Simulations 
Inc.), CeriuslDBAccess (Molecular Simulations Inc.), HypoGen (Molecular Simulations Inc.), Insight 
II, (Molecular Simulations Inc.), Discover (Molecular Simulations Inc.), CHARMm (Molecular 
Simulations Inc.), Felix (Molecular Simulations Inc.), DelPhi, (Molecular Simulations Inc.), 

25 QuanteMM, (Molecular Simulations Inc.), Homology (Molecular Simulations Inc.), Modeler 

(Molecular Simulations Inc.), ISIS (Molecular Simulations Inc.), Quanta/Protein Design (Molecular 
Simulations Inc.), WebLab (Molecular Simulations Inc.), WebLab Diversity Explorer (Molecular 
Simulations Inc.), Gene Explorer (Molecular Simulations Inc.), SeqFold (Molecular Simulations Inc.), 
the EMBL/Swissprotein database, the MDL Available Chemicals Directory database, the MDL Drug 

30 Data Report data base, the Comprehensive Medicinal Chemistry database, Derwents's World Drug 
Index database, the BioByteMasterFile database, the Genbank database, and the Genseqn database. 
Many other programs and data bases would be apparent to one of skill in the art given the present 
disclosure. 

Motifs which may be detected using the above programs include sequences encoding 
35 leucine zippers, helix-turn-helix motifs, glycosylation sites, ubiquitination sites, alpha helices, and 
beta sheets, signal sequences encoding signal peptides which direct the secretion of the encoded 
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proteins, sequences implicated in transcription regulation such as homeoboxes, acidic stretches, 
enzymatic active sites, substrate binding sites, and enzymatic cleavage sites. 



Throughout this application, various publications, patents and published patent 
5 applications are cited. The disclosures of these publications, patents and published patent 

specification referenced in this application are hereby incorporated by reference into the present 
disclosure to more fully describe the sate of the art to which this invention pertains. 

EXAMPLES 
Example 1 

I o Identification Of Biallelic Markers - DNA Extraction 

Blood donors were from French Caucasian origin. They presented a sufficient diversity 
for being representative of a French heterogeneous population. The DNA from 100 unrelated and 
healthy individuals was extracted, pooled and tested for the detection of biallelic markers. The pool 
was constituted by mixing equivalent quantities of DNA from each individual. 

15 30 ml of peripheral venous blood were taken from each donor in the presence of EDTA. 

Cells (pellet) were collected after centrifugation for 10 minutes at 2000 rpm. Red cells were lysed 
by a lysis solution (50 ml final volume : 10 mM Tris pH7.6; 5 mM MgCl 2 ; 10 mM NaCl). The 
solution was centrifuged (10 minutes, 2000 rpm) as many times as necessary to eliminate the 
residual red cells present in the supernatant, after resuspension of the pellet in the lysis solution. 

20 The pellet of white cells was lysed overnight at 42°C with 3 .7 ml of lysis solution 

composed of: 

- 3 ml TE 10-2 (Tris-HCl 10 mM, EDTA 2 mM) / NaCl 0.4 M 

- 200 \i\ SDS 10% 

- 500 \il K-proteinase (2 mg K-proteinase in TE 10-2 / NaCl 0.4 M). 

25 For the extraction of proteins, 1 ml saturated NaCl (6M) (1/3.5 v/v) was added. After 

vigorous agitation, the solution was centrifuged for 20 minutes at 10000 rpm. 

For the precipitation of DNA, 2 to 3 volumes of 100% ethanol were added to the previous 
supernatant, and the solution was centrifuged for 30 minutes at 2000 rpm. The DNA solution was 
rinsed three times with 70% ethanol to eliminate salts, and centrifuged for 20 minutes at 2000 rpm. 

30 The pellet was dried at 37°C, and resuspended in 1 ml TE 10-1 or 1 ml water. The DNA 
concentration was evaluated by measuring the OD at 260 nm (1 unit OD - 50 ug/ml DNA). 

To determine the presence of proteins in the DNA solution, the OD 260 / OD 280 ratio was 
determined. Only DNA preparations having a OD 260 / OD 280 ratio between 1.8 and 2 were used 
in the subsequent examples described below. 
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Example 2 

Identification Of Biallelic Markers: Amplification Of Genomic DNA By PCR 

The amplification of specific genomic sequences of the DNA samples of example 1 was 
carried out on the pool of DNA obtained previously. In addition, 10 individual samples were 



5 similarly amplified. 

PCR assays were performed using the following protocol: 

Final volume 25 ul 

DNA 2 ng^ 1 

MgCl 2 2 mM 

10 dNTP(each) 200 uM 

primer (each) 2.9 ng/u.1 

Ampli Taq Gold DNA polymerase 0.05 unit/ul 

PCR buffer (lOx = 0.1 M TrisHCl pH8.3 0.5M KC1) lx 



Each pair of first primers is designed using the sequence information of the BAP 28 gene 
1 5 disclosed herein and the OSP software (Hillier & Green, 1991). This first pair of primers were about 



20 nucleotides in length. 

Table 1 



Amplicon 


Position 


Primer 


Position range of 


Primer 


Complementary 


range of the 


name 


amplification primer 


name 


position range of 




amplicon in 




in SEQ ID No 1 




amplification primer 




SEQ ID No 1 










in SEQ ID I 


Vo 1 


5-381 


4840 


5266 


Bl 


4840 


4859 


CI 


5249 


5266 


5-382 


5307 


5729 


B2 


5307 


5324 


C2 


5710 


5729 


99-7190 


12946 


13488 


B3 


12946 


12963 


C3 


13471 


13488 


99-7203 


23482 


23929 


B4 


23482 


23501 


C4 


23909 


23929 


5-383 


27887 


28315 


B5 


27887 


27904 


C5 


28296 


28315 


99-7205 


29833 


30288 


B6 


29833 


29853 


C6 


30270 


30288 


5-384 


32439 


32877 


B7 


32439 


32457 


C7 


32858 


32877 


5-379 


48110 


48460 


B8 


48110 


48127 


C8 


48441 


48460 


5-380 


49558 


49977 


B9 


49558 


49577 


C9 


49958 


49977 


5-366 


50162 


50583 


B10 


50162 


50180 


C10 


50564 


50583 


5-370 


50937 


51359 


Bll 


50937 


50955 


Cll 


51341 


51359 


5-373 


53437 


53858 


B12 


53437 


53455 


C12 


53840 


53858 


5-375 


53974 


54394 


B13 


53974 


53993 


C13 


54375 


54394 


5-376 


54602 


55021 


B14 


54602 


54619 


C14 


55002 


55021 


5-377 


55608 


56043 


B15 


55608 


55625 


C15 


56025 


56043 


5-14 


59673 


60100 


B16 


59673 


59692 


C16 


60083 


60100 


5-11 


60718 


61137 


B17 


60718 


60737 


C17 


61119 


61137 


5-202 


66177 


66608 


B23 


66177 


66194 


C23 


66589 


66608 


99-1605 


71723 


72170 


B21 


71723 


71743 


C21 


72150 


72170 


5-2 


71735 


72169 


B22 


71735 


71754 


C22 


72150 


72169 


5-171 


85485 


85905 


B20 


85485 


85502 


C20 


85887 


85905 


5-169 


86184 


86600 


B19 


86184 


86203 


C19 


86581 


86600 


99-1572 


86932 


87574 


B18 


86932 


86952 


C18 


87556 


87574 


5-403 


91068 


1 91417 


B24 


91068 


91085 


C24 


91398 


91417 
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PATENT 



in SEQ ID No 29 


99-13790 1 1 


1 454 1 


B25 


1 | 20 


1 C25 | 


434 


454 


in SEQ ID No 25 


99-13798 1 1 


1 447 1 


B26 


1 | 20 


1 C26 | 


427 


447 


in SEQ ID No 27 


99-13808 1 1 


1 546 1 


B27 


| 1 | 20 


| C27 1 


526 


546 


in SEQ ID No 30 


99-13809 | 1 


| 444 | 


B28 


| 1 | 21 


| C28 | 


424 


444 








in SEQ DD No 28 








99-13810 1 1 


1 476 1 


B29 


1 1 1 18 


| C29 | 


458 


| 476 


in SEQ ID No 23 


99-1585 | 1 


I 546 


B30 


| 1 | 20 


| C30 | 


527 


| 546 


in SEQ ID No 24 


99-1587 1 1 


1 396 


B31 


1 1 1 21 


1 C31 | 


377 


| 396 


in SEQ ID No 31 


99-1597 1 1 


1 693 


B32 


1 1 1 19 


| C32 | 


675 


| 693 


in SEQ ID No 26 


99-1601 1 1 


1 506 


B33 


1 1 1 18 


| C33 | 


486 


| 506 


in SEQ ID No 18 


99-7177 | 1 


| 504 


B34 


| 1 | 20 


1 C34 | 


484 


| 504 








in SEQ ID No 22 








99-7182 I 1 


1 531 


B35 


| 1 | 20 


1 C35 | 


511 


| 531 


in SEQ ID No 21 


99-7186 | 1 


| 528 


| B36 


1 1 1 19 


1 C36 | 


510 


| 528 








in SEQ ID No 20 








99-7193 1 1 


1 542 


1 B37 


| 1 | 20 


| C37 | 


522 


| 542 


in SEQ ID No 19 


99-7212 | 1 


| 492 


| B38 


| 1 | 20 


| C38 


472 


| 492 



Preferably, the primers contained a common oligonucleotide tail upstream of the specific bases 
targeted for amplification which was useful for sequencing. 

Primers PU contain the following additional PU 5' sequence: 
5 TGTAAAACGACGGCCAGT; primers RP contain the following RP 5' sequence: 

CAGGAAACAGCTATGACC. The primer containing the additional PU 5' sequence is listed in 
SEQ ID No 1 1. The primer containing the additional RP 5' sequence is listed in SEQ ID No 12. 
The synthesis of these primers was performed following the phosphoramidite method, on a 
GEN SET UFPS 24. 1 synthesizer. 
10 DNA amplification was performed on a Genius II thermocycler. After heating at 95°C fc 

10 min, 40 cycles were performed. Each cycle comprised: 30 sec at 95°C, 54°C for 1 min, and 30 
sec at 72°C. For final elongation, 10 min at 72°C ended the amplification. The quantities of the 
amplification products obtained were determined on 96-well microtiter plates, using a fluorometer 
and Picogreen as intercalant agent (Molecular Probes). 
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Example 3 

Identification Of Biallelic Markers - Sequencing Of Amplified Genomic DNA And 
Identification Of Polymorphisms 

The sequencing of the amplified DNA obtained in example 2 was carried out on ABI 377 
5 sequencers. The sequences of the amplification products were determined using automated dideoxy 
terminator sequencing reactions with a dye terminator cycle sequencing protocol. The products of 
the sequencing reactions were run on sequencing gels and the sequences were determined using gel 
image analysis (ABI Prism DNA Sequencing Analysis software (2.1.2 version)). 

The sequence data were further evaluated to detect the presence of biallelic markers within 
10 the amplified fragments. The polymorphism search was based on the presence of superimposed 
peaks in the electrophoresis pattern resulting from different bases occurring at the same position as 
described previously. 

The localization of the biallelic markers on SEQ ID Nos 1, and 18 to 3 1 are as shown 
above in Table 2. 

1 5 Also encompassed by the present invention are BAP28-related biallelic markers Al to A58 

described below in Table 2. 

Table 2 



Amplicon 


BM 


Marker 
Name 


Localization 
in BAP28 


Polymor- 
phism 


BM position 
in 


BM position 
in SEQ ID 








gene 


alll 


all2 


SEQ ID No 1 


Nos 2, 3 & 4 


5-381 


Al 


5-381-133 


5' regulatory 
region 


A 


G 


4972 




5-382 


A2 


5-382-162 


Exon 2 


C 


T 


5468 


178 


5-382 


A3 


5-382-310 


Intron 2-3 


C 


T 


5616 




5-382 


A4 


5-382-316 


Intron 2-3 


G 


C 


5622 




99-7190 


A5 


99-7190-213 


Intron 6-7 


C 


T 


13158 




99-7203 


A6 


99-7203-282 


Intron 16-17 


A 


T 


23761 




99-7203 


A7 


99-7203-286 


Intron 16-17 


C 


T 


23765 




5-383 


A8 


5-383-42 


Intron 19-20 


A 


G 


27928 




5-383 


A9 


5-383-184 


Exon 20 


G 


T 


28070 


2677 


99-7205 


A10 


99-7205-228 


Intron 20-21 


A 


G 


30061 




5-384 


All 


5-384-312 


Intron 21-22 


G 


C 


32750 




5-379 


A12 


5-379-80 


Intron 32-33 


A 


C 


48189 




5-380 


A13 


5-380-58 


Intron 33-34 


G 


T 


49615 




5-380 


A14 


5-380-59 


Intron 33-34 


C 


T 


49616 




5-366 


A15 


5-366-143 


Intron 34-35 


A 


G 


50304 




5-370 


A16 


5-370-197 


Exon 36 


A 


G 


51133 


5193 


5-370 


A17 


5_370-247 


Exon 36 


C 


T 


51183 


5243 


5-373 


A18 


5-373-98 


Intron 38-39 


C 


T 


53534 




5-373 


A19 


5-373-164 


Exon 39 


c 


T 


53600 


5673 


5-373 


A20 


5-373-222 


Exon 39 


A 


G 


53658 


5731 


5-375 


A21 


5-375-200 


Exon 41 


A 


G 


54173 


6011 


5-375 


A22 


5-375-259 


Intron 41-42 


C 


T 


54232 




5-375 


A23 


5-375-296 


Intron 41-42 


G 


C 


54269 




5-375 


A24 


5-375-399 


Intron 41-42 


G 


C 


54372 
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5-376 


A25 


5-376-266 


Exon 42 


A 


G 


54867 


6162 


5-377 


A26 


5-377-82 


Intron 42-43 


C 


T 


55689 




5-377 


A27 


5-377-227 


Exon 43 


A 


G 


55834 


6271 


5-14 


A28 


5-14-165 


Intron 45-B' 


A 


G 


59937 




5-11 


A29 


5-11-158 


Intron 45-B' 


C 


T 


60980 




5-202 


A3 6 


5-202-117 


Intron 45-B' 


A 


T 


66492 




5-202 


A3 5 


5-202-95 


Intron 45-B' 


A 


C 


66514 




99-1605 


A33 


99-1605-112 


Intron 45-B' 


A 


G 


71834 




5-2 


A34 


5-2-178 


Intron 45-B' 


A 


G 


71993 




5-171 


A3 2 


5-171-204 


Intron 45-B' 


A 


G 


85702 




5-169 


A31 


5-169-97 


Intron B'-A' 


G 


C 


86504 




99-1572 


A30 


99-1572-440 


Intron B'-A' 


A 


G 


87135 




5-403 


A3 7 


5-403-325 


Intron B'-A' 


C 


T 


91093 




5-403 


A3 8 


5-403-294 


Intron B'-A' 


A 


G 


91124 




5-403 


A39 


5-403-209 


Intron B'-A' 


C 


T 


91209 




5-403 


A40 


5-403-156 


Exon A' 


C 


T 


91262 


7935 in SEQ 
ID No 3 

256 in SEQ 
ID No 4 


Amplicon 


BM 


Marker 
Name 


Polymor- 
phism 


BM position 




alll 


all2 


99-13790 


A41 


99-13790-129 


C 


T 


127 in SEQ ID No 29 


99-13798 


A42 


99-13798-284 


A 


G 


283 in SEQ ID No 25 


99-13808 


A43 


99-13808-80 


A 


T 


79 in SEQ ID No 27 


99-13808 


A44 


99-13808-268 


A 


C 


266 in SEQ ID No 27 


99-13808 


A45 


99-13808-425 


G 


C 


419 in SEQ ID No 27 


99-13808 


A46 


99-13808-455 


A 


G 


453 in SEQ ID No 27 


99-13809 


A47 


99-13809-153 


A 


G 


153 in SEQ ID No 30 


99-13810 


A48 


99-13810-214 


C 


T 


212 in SEQ ID No 28 


99-13810 


A49 


99-13810-170 


A 


T 


168 in SEQ ID No 28 


99-1585 


A50 


99-1585-373 


C 


T 


372 in SEQ ID No 23 


99-1587 


A51 


99-1587-281 


A 


G 


278 in SEQ ID No 24 


99-1597 


A52 


99-1597-162 


A 


G 


162 in SEQ ID No 31 


99-1601 


A53 


99-1601-402 


A 


T 


402 in SEQ ID No 26 


99-7177 


A54 


99-7177-81 


C 


T 


81 in SEQ ID No 18 




99-7182 


A55 


99-7182-49 


C 


T 


49 in SEQ ID No 22 


99-7186 


A56 


99-7186-212 


A 


G 


212 in SEQ ID No 21 




99-7193 


A57 


99-7193-228 


G 


C 


226 in SEQ ID No 20 




99-7212 


A58 


99-7212-346 


C 


T 


345 in SEQ ID No 19 





BM refers to "biallelic marker". Alll and all2 refer respectively to allele 1 and allele 2 of 
the biallelic marker. 

The biallelic markers A 16, A 19, A21 and A25 are located in exonic sequence and give 
5 amino acid polymorphisms. Indeed, the codon comprising the marker A16 encodes either a serine or 
an asparagine in position 1694 of the SEQ ID No 5 ; the codon comprising the marker A19 encodes 
either an alanine or a valine in position 1 854 of the SEQ ID No 5 ; the codon comprising the marker 
A21 encodes either an aspartic acid or an asparagine in position 1967 of the SEQ ID No 5 ; the 
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codon comprising the marker A25 encodes either a glycine or a glutamic acid in position 2017 
SEQ ID No 5. 

The Table 3 discloses the probes specific of each biallelic markers. 



Table 3 



BM 


Marker Name 


Position range of probes 
in SEQ ID No 1 


Probes 


Al 


5-381-133 


4960 


4984 


PI 


A2 


5-382-162 


5456 


5480 


P2 


A3 


5-382-310 


5604 


5628 


P3 


A4 


5-382-316 


5610 


5634 




A5 


99-7190-213 


13146 


13170 


57 


A6 


99-7203-282 


23749 


23773 


57 


A7 


99-7203-286 


23753 


23777 


P7 


A8 


5-383-42 


27916 


27940 


P8 


A9 


5-383-184 


28058 


28082 


P9 


A10 


99-7205-228 


30049 


30073 


P10 


All 


5-384-312 


32738 


32762 


Pll 


A12 


5-379-80 


48177 


48201 


P12 


A13 


5-380-58 


49603 


49627 


P13 


A14 


5-380-59 


49604 


49628 


P14 


A15 


5-366-143 


50292 


50316 


P15 


A16 


5-370-197 


51121 


51145 


P16 


A17 


5-370-247 


51171 


51195 


P17 


A18 


5-373-98 


53522 


53546 


P18 


A19 


5-373-164 


53588 


53612 


P19 


A20 


5-373-222 


53646 


53670 


P20 


A21 


5-375-200 


54161 


54185 


P21 


A22 


5-375-259 


54220 


54244 


P22 


A23 


5-375-296 


54257 


54281 


P23 


A24 


5-375-399 


54360 


54384 


P24 


A25 


5-376-266 


54855 


54879 


P25 


A26 


5-377-82 


55677 


55701 


P26 


A27 


5-377-227 


55822 


55846 


P27 


A28 


5-14-165 


59925 


59949 


P28 


A29 


5-11-158 


60968 


60992 


P29 


A36 


5-202-117 


66480 


66504 


P36 


A3 5 


5-202-95 


66502 


66526 


P35 


A33 


99-1605-112 


71822 


71846 


P33 


A34 


5-2-178 


71981 


72005 


P34 


A3 2 


5-171-204 


85690 


85714 




A31 


5-169-97 


86492 


86516 


P31 


A30 


99-1572-440 


87123 


87147 


P30 


A3 7 


5-403-325 


91081 


91105 


P37 


A3 8 


5.403-294 


91112 


91136 


P38 


A39 


5-403-209 


91197 


91221 


P39 


A40 


5-403-156 


91250 


91274 


P40 


BM 


Marker Name 


Position range of probes 


Probes 


A41 


99-13790-129 


115-139 in SEQ ID No 29 


P41 


A42 


99-13798-284 


271-295 in SEQ ID No 25 


P42 


A43 


99-13808-80 


67-91 in SEQ ID No 27 


P43 
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A44 


99-13808-268 


254-278 in SEQ ID No 27 


P44 


A45 


99-13808-425 


407-431 in SEQ ID No 27 


P45 


A46 


99-13808-455 


441-465 in SEQ ID No 27 


P46 


A47 


99-13809-153 


141-165 in SEQ ID No 30 


P47 


A48 


99-13810-214 


200-224 in SEQ ID No 28 


P48 


A49 


99-13810-170 


156-180 in SEQ ID No 28 


P49 


A50 


99-1585-373 


360-384 in SEQ ID No 23 


P50 


A51 


99-1587-281 
99-1597-162 


266-290 in SEQ ID No 24 
150-174 in SEQ ID No 31 


P51 

P52 


A52 


A53 


99-1601-402 


390-414 in SEQ ID No 26 


P53 


A54 


99-7177-81 


69-93 in SEQ ID No 1 8 


P54 


A55 


99-7182-49 


37-61 in SEQ ID No 22 


P55 


A56 


99-7186-212 


200-224 in SEQ ID No 21 


P56 


A57 


99-7193-228 


214-238 in SEQ ID No 20 


P57 


A58 


99-7212-346 


333-357 in SEQ ID No 19 


P58 



Example 4 



Validation Of The Polymorphisms Through Microsequencing 

The biallelic markers identified in example 3 were further confirmed and their respectr 
frequencies were determined through microsequencing. Microsequencing was carried out for ea 
5 individual DNA sample described in Example 1 . 

Amplification from genomic DNA of individuals was performed by PCR as described 
above for the detection of the biallelic markers with the same set of PCR primers. 

The preferred primers used in microsequencing were about 19 nucleotides in length an 
hybridized just upstream of the considered polymorphic base. According to the invention, the 
10 primers used in microsequencing are detailed in Table 4. 

Table 4 



Marker Name 


BM 


Misl 


Position range of 
microsequencing primer 
mis 1 in SEQ ID No 1 


Mis2 


Complement 
rang 
microsequen 
mis. 2 in SE 


ary position 
e of 

cing primer 
Q ID No 1 


5-381-133 


Al 


Dl 


4953 


4971 


El 


4973 


4991 


5-382-162 


A2 


D2 


5449 


5467 


E2 


5469 


5487 


5-382-310 


A3 


D3 


5597 


5615 


E3 


5617 


5635 


5-382-316 


A4 


D4 


5603 


5621 


E4 


5623 


5641 


99-7190-213 


A5 


D5 


13139 


13157 


E5 


13159 


13177 


99-7203-282 


A6 


D6 


23742 


23760 


E6 


23762 


23780 


99-7203-286 


A7 


D7 


23746 


23764 


E7 


23766 


23784 


5-383-42 


A8 


D8 


27909 


27927 


E8 


27929 


27947 


5-383-184 


A9 


D9 


28051 


28069 


E9 


28071 


28089 


99-7205-228 


A10 


D10 


30042 


30060 


E10 


30062 


30080 


5-384-312 


All 


Dll 


32731 


32749 


Ell 


32751 


32769 


5-379-80 


A12 


D12 


48170 


48188 


E12 


48190 


48208 


5-380-58 


A13 


D13 


49596 


49614 


E13 


49616 


49634 


5-380-59 


A14 


D14 


49597 


49615 


E14 


49617 


49635 


5-366-143 


A15 


D15 


50285 


50303 


E15 


50305 


50323 


5-370-197 


A16 


D16 


51114 


51132 


E16 


51134 


51152 


5-370-247 


A17 


D17 


51164 


51182 


E17 


51184 


51202 


5-373-98 


A18 


D18 


53515 


53533 


E18 


53535 


53553 
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5-373-164 


A19 


D19 


53581 


53599 


E19 


53601 


53619 


5-373-222 


A20 


D20 


53639 


53657 


E20 


53659 


53677 


5-375-200 


A21 


D21 


54154 


54172 


E21 


54174 


54192 


5-375-259 


A22 


D22 


54213 


54231 


E22 


54233 


54251 


5-375-296 


A23 


D23 


54250 


54268 


E23 


54270 


54288 


5-375-399 


A24 


D24 


54353 


54371 


E24 


54373 


54391 


5-376-266 


A25 


D25 


54848 


54866 


E25 


54868 


54886 


5-377-82 


A26 


D26 


55670 


55688 


E26 


55690 


55708 


5-377-227 


A27 


D27 


55815 


55833 


E27 


55835 


55853 


5-14-165 


A28 


D28 


59918 


59936 


E28 


59938 


59956 


5-11-158 


A29 


D29 


60961 


60979 


E29 


60981 


60999 


5-202-117 


A3 6 


D36 


66473 


66491 


E36 


66493 


66511 


5-202-95 


A3 5 


D35 


66495 


66513 


E35 


66515 


66533 


99-1605-112 


A3 3 


D33 


71815 


71833 


E33 


71835 


71853 


5-2-178 


A34 


D34 


71974 


71992 


E34 


71994 


72012 


5-171-204 


A3 2 


D32 


85683 


85701 


E32 


85703 


85721 


5-169-97 


A31 


D31 


86485 


86503 


E31 


86505 


86523 


99-1572-440 


A30 


D30 


87116 


87134 


E30 


87136 


87154 


5-403-325 


A37 


D37 


91074 


91092 


E37 


91094 


91112 


5-403-294 


A38 


D38 


91105 


91123 


E38 


91125 


91143 


5-403-209 


A39 


D39 


91190 


91208 


E39 


91210 


91228 


5-403-156 


A40 


D40 


91243 


91261 


E40 


91263 


91281 


Marker Name 


BM 


Misl 


Position range of 
microsequencing primer 
mis 1 


Mis2 


Complementary position 

range of 
microsequencing primer 
mis. 2 


99-13790-129 


A41 


D41 


108-126 in SEQ ID No 29 


E41 


128-146 in SEQ ID No 29 


99-13798-284 


A42 


D42 


264-282 in SEQ ID No 25 


E42 


284-302 in SEQ ID No 25 


99-13808-80 


A43 


D43 


60-78 in SEQ ID No 27 


E43 


80-98 in SEQ ID No 27 


99-13808-268 


A44 


D44 


247-265 in SEQ ID No 27 


E44 


267-285 in SEQ ID No 27 


99-13808-425 


A45 


D45 


400-418 in SEQ ID No 27 


E45 


420-438 in SEQ ID No 27 


99-13808-455 


A46 


D46 


434-452 in SEQ ID No 27 


E46 


454-472 in SEQ ID No 27 


99-13809-153 


A47 


D47 


134-152 in SEQ ID No 30 


E47 


154-172 in SEQ ID No 30 


99-13810-214 


A48 


D48 


193-211 in SEQ ID No 28 


E48 


213-231 in SEQ ID No 28 


99-13810-170 


A49 


D49 


149-167 in SEQ ID No 28 


E49 


169-187 in SEQ ID No 28 


99-1585-373 


A50 


D50 


353-371 in SEQ ID No 23 


E50 


373-391 in SEQ ID No 23 


99-1587-281 


A51 


D51 


259-277 in SEQ ID No 24 


E51 


279-297 in SEQ ID No 24 


99-1597-162 


A52 


D52 


143-161 in SEQ ID No 31 


E52 


163-181 in SEQ ID No 31 


99-1601-402 


A53 


D53 


383-401 in SEQ ID No 26 


E53 


403-421 in SEQ ID No 26 


99-7177-81 


A54 


D54 


62-80 in SEQ ID No 18 


E54 


82-100 in SEQ ID No 18 


99-7182-49 


A55 


D55 


30-48 in SEQ ID No 22 


E55 


50-68 in SEQ ID No 22 


99-7186-212 


A56 


D56 


193-211 in SEQ ID No 21 


E56 


213-231 in SEQ ID No 21 


99-7193-228 


A57 


D57 


207-225 in SEQ ID No 20 


E57 


227-245 in SEQ ID No 20 


99-7212-346 


A58 


D58 


326-344 in SEQ ID No 19 


E58 


346-364 in SEQ ID No 19 



Mis 1 and Mis 2 respectively refer to microsequencing primers which hybridized with the 
non-coding strand of the BAP28 gene or with the coding strand of the BAP 28 gene. 
The microsequencing reaction was performed as follows : 
5 After purification of the amplification products, the microsequencing reaction mixture was 

prepared by adding, in a 20ul final volume: 10 pmol microsequencing oligonucleotide, 1 U 
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Thermosequenase (Amersham E79000G), 1 .25 \il Thermosequenase buffer (260 mM Tris HC1 pH 
9.5, 65 mM MgCl 2 ), and the two appropriate fluorescent ddNTPs (Perkin Elmer, Dye Terminator Set 
401 095) complementary to the nucleotides at the polymorphic site of each biallelic marker tested, 
following the manufacturer's recommendations. After 4 minutes at 94°C, 20 PCR cycles of 1 5 sec 
5 at 55°C, 5 sec at 72°C, and 10 sec at 94°C were carried out in a Tetrad PTC-225 thermocycler (MJ 
Research). The unincorporated dye terminators were then removed by ethanol precipitation. 
Samples were finally resuspended in formamide-EDTA loading buffer and heated for 2 min at 95°C 
before being loaded on a polyacrylamide sequencing gel. The data were collected by an ABI 
PRISM 377 DNA sequencer and processed using the GENESCAN software (Perkin Elmer). 

10 Following gel analysis, data were automatically processed with software that allows the 

determination of the alleles of biallelic markers present in each amplified fragment. 

The software evaluates such factors as whether the intensities of the signals resulting from 
the above microsequencing procedures are weak, normal, or saturated, or whether the signals are 
ambiguous. In addition, the software identifies significant peaks (according to shape and height 

15 criteria). Among the significant peaks, peaks corresponding to the targeted site are identified based 
on their position. When two significant peaks are detected for the same position, each sample is 
categorized classification as homozygous or heterozygous type based on the height ratio. 

Example 5 

Association Study Between Prostate Cancer And The Biallelic Markers Of The PCTA-1 Gene 

20 Collection Of DNA Samples From Affected And Non-Affected Individuals 

Affected population : 

The positive trait followed in this association study was prostate cancer. Prostate cancer 
patients were recruited according to a combination of clinical, histological and biological inclusion 
criteria. Clinical criteria can include rectal examination and prostate biopsies. Biological criteria 

25 can include PSA assays. The affected individuals were recorded as familial forms when at least two 
persons affected by prostate cancer have been diagnosed in the family. Remaining cases were 
classified as sporadic cases, and more particularly in informative cases (at least two sibs of the case 
both aged over 50 years old are unaffected), or sporadic uninformative cases (no information about 
sibs over 50 years old is available). All affected individuals included in the statistical analysis of 

30 this patent were unrelated. Cases were also separated following the criteria of diagnosis age : early 
onset prostate cancer (under 65 years old) and late onset prostate cancer (65 years old or more). 
Unaffected population : 

Control individuals included in this study were checked for both the absence of all clinical 
and biological criteria defining the presence or the risk of prostate cancer (PSA < 4) (WO 96/21042), 
35 and for their age (aged 65 years old or more). All unaffected individuals included in the statistical 
analysis of this patent were unrelated. 
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The affected group was composed by 491 unrelated individuals, comprising: 

- 197 familial cases; and 

- 294 sporadic cases, 70 of which are sporadic informative cases. 

The unaffected group contained 313 individuals which were 65 years or older. 

Genotyping Of Affected And Control Individuals 

The general strategy to perform the association studies was to individually scan the DNA 
samples from all individuals in each of the populations described above in order to establish the 
allele frequencies of the above described biallelic markers in each of these populations. More 
particularly, the 30 biallelic markers used in the present association study are described in Table 5. 

Allelic frequencies of the biallelic markers of the Table 5 in each population were 
determined by performing microsequencing reactions on amplified fragments obtained by genomic 
PCR performed on the DNA samples from each individual. Genomic PCR and microsequencing 
were performed as detailed above in examples 2 and 4 using the described PCR and 
microsequencing primers. 

Table 5 



BM 


Marker Name 


Position in BAP28 
gene 


Position in 
PCTA-1 gene 


Nb of 
controls 


Frequency 
(allele) 


A54 


99-7177/81 


5' of gene 


3' of gene 


257 


69.07 (C) 


A58 


99-7212/346 


5' of gene 


3' of gene 


259 


66.99 (C) 


A57 


99-7193/228 


5' of gene 


3' of gene 


250 


59.2 (C) 


A56 


99-7186/212 


5' of gene 


3' of gene 


292 


66.1 (A) 


A55 


99-7182/49 


5' of gene 


3' of gene 


287 


63.59 (C) 


Al 


5-381/133 


5 'regulatory region 


3' of gene 


304 


65.46 (G) 


A4 


5-382/316 


intron 2-3 


3' of gene 


304 


65.79 (C) 


A5 


99-7190/213 


intron 6-7 


3' of gene 


297 


72.9 (C) 


A7 


99-7203/286 


intron 16-17 


3' of gene 


257 


68.09 (T) 


All 


5-384/312 


intron 21-22 


3' of gene 


211 


73.22 (G) 


A12 


5-379/80 


intron 32-33 


3' of gene 


294 


73.98 (A) 


A16 


5-370/197 


Exon 36 


3' of gene 


287 


76.31 (G) 


A19 


5-373/164 


Exon 39 


3' of gene 


298 


68.62 (C) 


A21 


5-375/200 


exon 41 


3' of gene 


307 


68.73 (G) 


A25 


5-376/266 


exon 42 


3' of gene 


298 


68.96 (G) 


All 


5-377/227 


exon 43 


3' of gene 


307 


68.73 (A) 


A28 


5-14/165 


intron 45-B' 


3'UTR 


307 


65.15 (T) 


A29 


5-11/158 


intron 45-B' 


3'UTR 


303 


75.41 (G) 


A35 


5-202/95 


intron 45-B' 


Exon 6b 


308 


95.13 (G) 


A33 


99-1605/112 


intron 45-B' 


intron 2 


304 


68.75 (G) 


A34 


5-2/178 


intron 45-B' 


Exon 2 


306 


68.3 (C) 


A32 


5-171/204 


intron 45-B' 


intron B 


307 


70.85 (T) 


A31 


5-169/97 


intron B'-A' 


intron D 


305 


82.3 (C) 


A30 


99-1572/440 


intron B'-A' 


intron D 


304 


65.79 (T) 


A50 


99-1585/373 


3' of gene 


5' of gene 


300 


78 (C) 


A51 


99-1587/281 


3 ' of gene 


5' of gene 


286 


67.31 (G) 


A42 


99-13798/284 


3 ' of gene 


5' of gene 


278 


53.42 (A) 


A53 


99-1601/402 


3' of gene 


5 1 of gene 


305 


67.21 (A) 


A43 


99-13808/80 


3 ' of gene 


5' of gene 


214 


59.58 (T) 
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1 A48 | 99-13810/214 | 3' of gene [ 5' of gene | 289 | 59.86 (T) | 



Association Study Between Prostate Cancer And The Biallelic Markers Of The 
BAP28 Gene : Single marker association 

Frequencies of biallelic alleles were compared in case-control populations described 
5 above. We compare different sub-populations in function of phenotypes (sporadic and familial cases 
vs controls) to determine the characterisation of assocation. 

The Figure 5 shows the results of allelic association analysis for markers localized in and 
around BAP28 gene. This analysis tests the difference of allelic frequency for each marker between 
population. The statistical significance of this difference is assessed by performing a Pearson chi- 
1 0 square test with one degree of freedom. 

The genotyped markers A55 (99-7182/49), A4 (5-382/316), A19 (5-373/164), A28 (5- 
14/165), A42 (99-13798/284), and A53 (99-1601/402) are significant at the 5% level for allelic test 
(respectively, pvalue=4 x 10" 2 , 4 x 10" 3 , 4 x 10" 2 , 1 x 10" 2 , 2 x 10" 2 , and 7 x 10" 3 ) for sporadic cases. 
The 4 markers A28 (5-14/165), A4 (5-382/316), Al (5-381/133), and A55 (99-7182/49) present a 
15 high significant association for allelic test (respectively, pvalue=4 x 10" 5 , 8 x 10" 6 , 3 x 10°, and 1 x 
10" 4 ) between informatif sporadic cases and controls. The marker A30 (99-1572/440) is significant 
for familial cases (allelic pvalue=3 x 10" 2 ). 

Frequencies of the genotypes for one biallelic marker were compared in case-control 
populations described above. We compare different sub-populations in function of phenotypes 
20 (sporadic and familial cases vs controls) to determine the characterisation of assocation. The Figure 
6 shows the results of genotypic association analysis for markers localized in and around BAP28 
gene. This analysis compares the three genotype frequencies between the two studied population. 
The statistical test used is a Pearson chi-square with 2 degree of freedom. 

The genotyped markers A4 (5-382/316), A19 (5-373/164), A28 (5-14/165), A50 (99- 
25 1585/373), A42 (99-13798/284), and A53 (99-1601/402) are significant at the 5% level for allelic 
test (respectively, pvalue=9 x 10" 3 , 9 x 10" 2 , 4 x 10~ 2 , 4 x 10" 2 , 8 x 10" 2 , and 3 x 10" 2 ) for sporadic 
cases. The 4 markers A28 (5-14/165), A4 (5-382/316), Al (5-381/133), and A55 (99-7182/49) 
present a high significant association for allelic test (respectively, pvalue=l x 10" 5 , 2 x 10" 5 , 3 x 10" 6 , 
and 1 x 10" 5 ) between informatif sporadic cases and controls. The 2 markers A31 (5-169/97) and 
30 A33 (99-1605/1 12) are significant for familial cases (respectively, pvalue=3 x 1 0" 2 and 2 x 10" 2 ). 

The results of the association studies show that a polymorphism of the BAP28 gene is related to 
sporadic and/or familial assocation. The biallelic markers A55 (99-7182/49), Al (5-381/133), A4 (5- 
382/316), A19 (5-373/164), A28 (5-14/165), A50 (99-1585/373), A42 (99-13798/284), A31 (5-169/97), 
A33 (99- 1 605/1 1 2), and A53 (99- 1 60 1/402) can be then used in diagnostics with a test based on these 
35 markers. 
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Haplotype Frequency Analysis 

One way of increasing the statistical power of individual markers, is by performing 
haplotype association analysis. 

Haplotype analysis for association of BAP28 markers and prostate cancer was performed 
5 by estimating the frequencies of all possible haplotypes comprising biallelic markers of the Table 5 
in the cases and control populations described in Example 5, and comparing these frequencies by 
means of a chi square statistical test (one degree of freedom). Haplotype estimations were 
performed by applying the Expectation-Maximization (EM) algorithm (Excoffier L & Slatkin M, 
1995), using the EM-HAPLO program (Hawley ME, Pakstis AJ & Kidd KK, 1994). More 

10 particularly, two tests were performed, namely a haplo-max test and an Omnibus LR test which 
compares the profile of haplotype frequencies were also performed. 

The haplo-max test, which is based on haplotype frequencies differences, selects the 
difference showing the maximum positive (maxM) or negative (maxS) test value between cases 
versus controls (rejecting test values based on rare haplotype frequencies, i.e, with an estimated 

1 5 number of haplotypes carriers inferior to 1 0) ; for one combination of markers there is therefore one 
Max-M and one Max-S test values. 

For one combination of 2, 3 or 4 markers, the Omnibus Likelihood ratio test allows to 
compare the profile of haplotype frequency differences between the two populations under study. 
The null hypothesis is that both cases and controls are samples derived from the same population, 

20 i.e., the haplotypes frequencies are close. Using the E-M algorithm, one can calculate the haplotype 
frequencies in cases, in controls and in the overall population. Once the haplotype frequencies are 
estimated, a likelihood ratio test (LR test) can be derived. It has to be underlined that for one 
combination of markers, only one LR test is obtained. If the data at hand would be observed 
haplotypes frequencies, provided there are no rare haplotypes, the LR test should follows a Chi- 

25 square distribution with h-1 degree of freedom, h being the number of possible haplotypes. This is to 
say: for two markers, a chi-square with 4 degree of freedom; for 3 markers, a chi-square with 7 
degree of freedom; and for 4-markers, a chi-square with 15 degree of freedom. As haplotype 
frequencies are only inferred via the E-M algorithm and that rare haplotypes occur, a permutation 
procedure is more suitable. 

30 The results of haplotype analysis using all combinations of 2 or 3 biallelic markers from 

the BAP28-re\ated biallelic markers of the Table 5 are represented in the Figures 7 to 1 1 . As above- 
mentioned, the profile of haplotypes frequencies have been compared by two main approaches: 
Individual haplotype tests and Omnibus Likelihood ratio tests. A permutation procedure allowed 
assessment of the significance of the tests. The most significant haplotypes obtained are shown in 

35 Figure 12. We analyzed separately the familial cases and sporadic cases, because the singlepoint 
analyses showed the different significant SNPs pattern. 
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Haplotype frequency analysis for prostate cancer cases 

The most significant haplotypes obtained with the cases of prostate cancer are shown in 
Figure 7 a and b. 

The two-markers haplotypes comprise the biallelic markers Al (5-381/133), A4 (5- 

5 382/316), A19 (5-373/164), A21 (5-375/200), A25 (5-376/266), A27 (5-377/227), A53 (99- 
1601/402), A42 (99-13798/284), and A55 (99-7182/49). 

The preferred two-markers haplotypes are described in Figure 7a as HI to H8. All these 
haplotypes comprise either the biallelic marker A53 (99-1601/402) or A42 (99-13798/284). One of 
the more preferred haplotype is the haplotype HI and it comprises the biallelic markers A53 (99- 

1 0 1 60 1/402) and A27 (5-377/227), alleles TG respectively. This haplotype presented a p-value for the 
haplotype frequency test of 3.9 x 10" 4 and an odd-ratio of 1.80. Estimated haplotype frequencies 
were 15.6 % in the cases and 9.3 % in the controls. This haplotype presented a p-value for the 
likelihood ratio test of 1 .7x 1 0" 2 . The pvalue by permutation test is <1 x 1 0" 2 and the pvalue for this 
group of markers is 5 x 10" 2 by omnibus Lr test. 

1 5 The three-markers haplotypes comprise the biallelic markers A5 3 (99- 1 60 1/402), A42 (99- 

13798/284), A51 (99-1587/281), A31 (5-169/97), A34 (5-2/178), A33 (99-1605/112), A28 (5- 
14/165), A27 (5-377/227), A25 (5-376/266), A21 (5-375/200), A19 (5-373/164), A7 (99-7203/286), 
A4 (5-382/316), A55 (99-7182/49), A56 (99-7186/212), A57 (99-7193/228), A58 (99-7212/346). 

The preferred three-markers haplotypes are described in Figure 7b as H435 to H452. All 

20 these haplotypes comprise the biallelic marker A53 (99-1601/402). Most of them comprise the 
biallelic marker A5 1 (99-1587/281). The more preferred haplotype is the haplotype H435 and 
comprises the biallelic markers A53 (99-1601/402), A51 (99-1587/281) and A34 (5-2/178), alleles 
TAT, respectively. This haplotype presented a p-value for the haplotype frequency test of 3.3 x 10 s 
and an odd-ratio of 100. Estimated haplotype frequencies were 5.3 % in the cases and 0 % in the 

25 controls. This haplotype presented a p-value for the likelihood ratio test of 7.3 x 10" . The pvalue by 
permutation test is <1 x 10" 2 and the pvalue for this group of markers is 1 x 10" 2 by omnibus Lr test. 

In conclusion, most preferred haplotypes for the cases of prostate cancer comprise the 
biallelic marker A53 (99-1601/402). Some other preferred haplotypes for the cases of prostate 
cancer comprise the biallelic markers A42 (99-13798/284) and/or A51 (99-1587/281). These 

30 haplotypes can be used in diagnostic, more particularly in diagnostics of prostate cancer 
susceptibility. 

Haplotype frequency analysis for familial cases of prostate cancer 

The most significant haplotypes obtained with the familial cases of prostate cancer are 
shown in Figure 8 a and b. 
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The two-markers haplotypes comprise the biallelic markers A51 (99-1587/281), A30 (99- 
1572/440), A32 (5-171/204), A34 (5-2/178), A33 (99-1605/112), A29 (5-1 1/158), A27 (5-377/227), 
A19 (5-373/164), A5 (99-7190/213), A56 (99-7186/212), and A54 (99-7177/81). 

The preferred two-markers haplotypes are described in Figure 8a as HI to H10. All these 
5 haplotypes comprise either the biallelic marker A51 (99-1587/281) or A30 (99-1572/440). One of 
the more preferred haplotype is the haplotype H4. The pvalue of haplotype H 4 obtained by a chi- 
square distribution with 2 ddl for this combination of 2 markers with A30 (99-1572/440) and A32 
(5-171/204) is 2.4 x 10" 3 by omnibus test. These markers are not in disequilibium linkage. In 
concerning the individual haplotype test, this haplotype consisting of 2 biallelic markers presented a 

10 9.7 x 10" 5 p-value of and an odd-ratio of 1.7, for alleles TT respectively. The pvalue by permutation 
test is <1 x 10" 2 and the pvalue for this group of markers is 1 x 10" 2 by omnibus Lr test. This 
haplotype tested on all cases-controls population gives estimated haplotype frequencies for sporadic 
cases (n=197) of 57.1% and for controls (n=313) of 44.1%. The trend about of estimations of 
haplotype frequencies are not identic between familial and sporadic cases , but the trend of sporadics 

1 5 are same for controls. 

The three-markers haplotypes comprise the biallelic markers A48 (99-13810/214), A53 
(99-1601/402), A42 (99-13798/284), A51 (99-1587/281), A30 (99-1572/440), A32 (5-171/204), 
A34 (5-2/178), A33 (99-1605/1 12), A29 (5-11/158), A27 (5-377/227), A19 (5-373/164), A7 (99- 
7203/286), A5 (99-7190/213), A56 (99-7186/212) and A54 (99-7177/81). 

20 The preferred three-markers haplotypes are described in Figure 8b as H436 to H454. Most 

of them comprise the biallelic marker A30 (99-1572/440), A51 (99-1587/281) and A53 (99- 
1601/402). One of the more preferred haplotype is the haplotype H437 and comprises the biallelic 
markers A53 (99-1601/402), A30 (99-1572/440) and A54 (99-7177/81), alleles ATC, respectively. 
This haplotype presented a p-value for the haplotype frequency test of 3.6 x 10" 7 and an odd-ratio of 

25 2.13. Estimated haplotype frequencies were 44.8 % in the cases and 27.6 % in the controls. This 
haplotype presented a p-value for the likelihood ratio test of 2.9 x 10" 3 . The pvalue by permutation 
test is <1 x 10" 2 and the pvalue for this group of markers is 1 x 10" 2 by omnibus Lr test. 

In conclusion, most preferred haplotypes for the familial cases of prostate cancer comprise 
the biallelic markers A30 (99-1572/440), and A51 (99-1587/281). These haplotypes can be used in 

30 diagnostic, more particularly in diagnostics of familial prostate cancer susceptibility. 

The most significant haplotypes obtained with the early onset familial cases of prostate 
cancer are shown in Figure 9 a and b. 

The two-markers haplotypes comprise the biallelic markers A42 (99-13798/284), A5 1 (99- 
1587/281), A50 (99-1585/373), A30 (99-1572/440), A32 (5-171/204), A34 (5-2/178), A33 (99- 

35 1605/112), A29 (5-11/158), A19 (5-373/164), A16 (5-370/197), A12 (5-379/80), All (5-384/312), 
A7 (99-7203/286), A5 (99-7190/213), A4 (5-382/316), and A54 (99-7177/81). 
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The preferred two-markers haplotypes are described in Figure 7a as HI to HI 3. Most of 
these haplotypes comprise the biallelic marker A30 (99-1572/440). One of the more preferred 
haplotype is the haplotype HI and it comprises the biallelic markers A30 (99-1572/440) and A32 (5- 
171/204), alleles TT respectively. This haplotype presented a p-value for the haplotype frequency 
5 test of 2.5 x 10" 6 and an odd-ratio of 2.28. Estimated haplotype frequencies were 64.4 % in the cases 
and 44.2 % in the controls. This haplotype presented a p-value for the likelihood ratio test of 8.3 x 
10" 5 . The pvalue by permutation test is <1 x 10" 2 and the pvalue for this group of markers is 5 x 10" 2 
by omnibus Lr test. 

The three-markers haplotypes comprise the biallelic markers A53 (99-1601/402), A30 (99- 
10 1572/440), A32 (5-171/204), A34 (5-2/178), A33 (99-1605/1 12), A29 (5-11/158), A21 (5-375/200), 
A19 (5-373/164), A12 (5-379/80), All (5-384/312), A7 (99-7203/286), A5 (99-7190/213), A56 (99- 
7186/212), and A54 (99-7177/81). 

The preferred three-markers haplotypes are described in Figure 9b as H421 to H443. All of 
them comprise the biallelic marker A30 (99-1 572/440)and almost all of them comprise the biallelic 
1 5 marker A53 (99- 1 601/402). One of the more preferred haplotype is the haplotype H421 and 
comprises the biallelic markers A53 (99-1601/402), A30 (99-1572/440) and A5 (99-7190/213), 
alleles ATC, respectively. This haplotype presented a p-value for the haplotype frequency test of 2.3 
x 10" 7 and an odd-ratio of 2.7. Estimated haplotype frequencies were 52.3 % in the cases and 28.8 % 
in the controls. This haplotype presented a p-value for the likelihood ratio test of 8.6 x 1 0" 4 . The 
20 pvalue by permutation test is <1 x 10" 2 and the pvalue for this group of markers is 1 x 10~ 2 by 
omnibus Lr test. 

In conclusion, most preferred haplotypes for the early onset familial cases of prostate 
cancer comprise the biallelic markers A30 (99-1572/440), and A53 (99-1601/402). These haplotypes 
can be used in diagnostic, more particularly in diagnostics of early onset familial prostate cancer 
25 susceptibility. 

Haplotype frequency analysis for sporadic cases of prostate cancer 

The most significant haplotypes obtained with the sporadic cases of prostate cancer are 
shown in Figure 10 a and b. 

The two-markers haplotypes comprise the biallelic markers A53 (99-1601/402), A42 (99- 
30 13798/284), A32 (5-171/204), A29 (5-11/158), A28 (5-14/165), A27 (5-377/227), A25 (5-376/266), 
A19 (5-373/164), A16 (5-370/197), A4 (5-382/316), and A55 (99-7182/49). 

The preferred two-markers haplotypes are described in Figure 10a as HI to H12. The 
more usual biallelic markers in these haplotypes are A4 (5-382/316), A53 (99-1601/402), and A42 
(99-13798/284). One of the more preferred haplotype is the haplotype HI and comprises the biallelic 
35 markers A53 (99-1601/402), and A4 (5-382/316), alleles TG respectively. This haplotype presented 
a p-value for the haplotype frequency test of 1 x 10~ 5 and an odd-ratio of 2.09. Estimated haplotype 
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frequencies were 19.9 % in the cases and 10.6 % in the controls. This haplotype presented a p- value 
for the likelihood ratio test of 4.4 x 10" 4 . The pvalue by permutation test is <1 x 10" 2 and the pvalue 
for this group of markers is 1 x 10" 2 by omnibus Lr test. The results of allelic association which 
show that these markers are associated are significant. The haplotype analysis by combining 
5 the informativeness of a set of biallelic markers increases the power of the association 
analysis, allowing false positive and/or negative data that may result from the single marker 
studies to be elimated. . The significant trend for singlepoint analysis seems to be identic for 
multipoint analysis. This haplotype tested on all cases-controls population gives estimated 
haplotype frequencies for sporadic cases (n=294) of 19.6% and for controls (n=313) of 

1 0 1 0.6%. For the same haplotype, any significant results for familial cases can be found. 
Therefore, the association for sporadic cases is differents for familial cases. 

The three-markers haplotypes comprise the biallelic markers A53 (99-1601/402), A42 (99- 
13798/284), A51 (99-1587/281), A31 (5-169/97), A34 (5-2/178), A27 (5-377/227), A25 (5- 
376/266), A21 (5-375/200), A19 (5-373/164),and A55 (99-7182/49). 

15 The preferred three-markers haplotypes are described in Figure 10b as H436 to H444. All 

the haplotypes comprise the biallelic marker A53 (99-1601/402). The biallelic markers A42 (99- 
13798/284) and A51 (99-1587/281) are frequently found in these haplotypes. One of the more 
preferred haplotype is the haplotype H436 and comprises the biallelic markers A53 (99-1601/402), 
A51 (99-1587/281) and A34 (5-2/178), alleles TAT respectively. This haplotype presented a p-value 

20 for the haplotype frequency test of 5.4 x 10" 7 and an odd-ratio of 100. Estimated haplotype 

frequencies were 5.6 % in the cases and 0 % in the controls. This haplotype presented a p-value for 
the likelihood ratio test of 3.5 x 10" 3 . The pvalue by permutation test is <1 x 10" 2 and the pvalue for 
this group of markers is 1 x 10" 2 by omnibus Lr test.. 

In conclusion, most preferred haplotypes for the sporadic cases of prostate cancer comprise 

25 the biallelic marker A53 (99-1601/402). The biallelic markers A42 (99-13798/284), A5 1 (99- 
1587/281) and A4 (5-382/3 16) are frequently found in the preferred haplotypes. These haplotypes 
can be used in diagnostic, more particularly in diagnostics of sporadic prostate cancer susceptibility. 

The most significant haplotypes obtained with the informative sporadic cases of prostate 
30 cancer are shown in Figure 1 1 a and b. 

The two-markers haplotypes comprise the biallelic markers A53 (99-1601/402), A30 (99- 
1572/440), A32 (5-171/204), A29 (5-1 1/158), A16 (5-370/197), A4 (5-382/316), Al (5-381/133), 
and A55 (99-7182/49). 

The preferred two-markers haplotypes are described in Figure 1 la as HI to HI 1 . The 
35 more usual biallelic markers in these haplotypes are A4 (5-382/3 16), and Al (5-381/133). One of the 
more preferred haplotype is the haplotype HI and comprises the biallelic markers A16 (5-370/197), 
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and Al (5-381/133), alleles GA respectively. This haplotype presented a p-value for the haplotype 
frequency test of 9.4 x 10~ 8 and an odd-ratio of 3.43. Estimated haplotype frequencies were 28.6 % 
in the cases and 10.5 % in the controls. This haplotype presented a p-value for the likelihood ratio 
test of 6.7 x 10" 7 . The pvalue by permutation test is <1 x 10" 2 and the pvalue for this group of 
5 markers is 1 x 10" 2 by omnibus Lr test. 

The three-markers haplotypes comprise the biallelic markers A53 (99-1601/402), A50 (99- 
1585/373), A30 (99-1572/440), A31 (5-169/97), A34 (5-2/178), A33 (99-1605/112), A29 (5- 
1 1/158), A28 (5-14/165), A27 (5-377/227), A25 (5-376/266), A21 (5-375/200), A16 (5-370/197), 
A4 (5-382/316), Al (5-381/133), and A55 (99-7182/49). 

10 The preferred three-markers haplotypes are described in Figure 1 lb as H415 to H430. 

Most of the haplotypes comprise the biallelic markers A53 (99-1601/402) and A31 (5-169/97). The 
biallelic markers A50 (99-1585/373), A16 (5-370/197), A4 (5-382/316), and Al (5-381/133) are 
frequently found in these haplotypes. One of the more preferred haplotype is the haplotype H415 
and comprises the biallelic markers A50 (99-1585/373), Al 6 (5-370/197), and Al (5-381/133), 

1 5 alleles CGA respectively. This haplotype presented a p-value for the haplotype frequency test of 3 .8 
x 10" 9 and an odd-ratio of 4.25. Estimated haplotype frequencies were 26.7 % in the cases and 7.9 % 
in the controls. This haplotype presented a p-value for the likelihood ratio test of 3.3 x 10" 6 . The 
pvalue by permutation test is <1 x 1 0~ 2 and the pvalue for this group of markers is 1 x 1 0" 2 by 
omnibus Lr test.. 

20 In conclusion, most preferred haplotypes for the informative sporadic cases of prostate 

cancer comprise the biallelic markers A53 (99-1601/402), A31 (5-169/97), A4 (5-382/316), and Al 
(5-381/133). The biallelic markers A50 (99-1585/373), A16 (5-370/197) are also frequently found 
in the preferred haplotypes. These haplotypes can be used in diagnostic, more particularly in 
diagnostics of informative sporadic prostate cancer susceptibility. 

25 Summary of haplotype frequency analysis 

The most preferred two- biallelic markers haplotypes for the familial and sporadic prostate 
cancer are summarized in Figure 12. This haplotype can be used in diagnostic of prostate cancer 
susceptibility. 

The statistical significance of the results obtained for the haplotype analysis was evaluated 
30 by a phenotypic permutation test reiterated 1 000 times on a computer. For this computer simulation, 
data from the cases and control individuals were pooled and randomly allocated to two groups which 
contained the same number of individuals as the case-control populations used to produce the 
haplotype frequency analysis data. A haplotype analysis was then run on these artificial groups for 
the preferred haplotypes which presented a strong association with prostate cancer. This experiment 
35 was reiterated 1000 times and the results are shown in Figure 12. 
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Figure 12A shows the association results the preferred haplotype with A30 (99-1572/440) 
and A32 (5-171/204), alleles TT, for each population and with 1000 permutations. This haplotype is 
specific of familial prostate cancer, and more particularly of early onset prostate cancer. This 
haplotype is highly significant and could be used in diagnostic. 
5 Figure 12B shows the association results the preferred haplotype with A16 (5-370/197), 

and Al (5-381/133), alleles GA, for each population and with 1000 permutations. This haplotype is 
specific of sporadic prostate cancer. This haplotype is highly significant and could be used in 
diagnostic. 

Figure 12C shows the association results the preferred haplotype with A53 (99-1601/402), 
10 and A4 (5-382/316), alleles TG, for each population and with 1000 permutations. This haplotype is 
specific of prostate cancer, and more particularly of sporadic prostate cancer. This haplotype is 
highly significant and could be used in diagnostic. 

Example 6 

Preparation of Antibody Compositions to the BAP28 protein 

1 5 Substantially pure protein or polypeptide is isolated from transfected or transformed cells 

containing an expression vector encoding the BAP28 protein or a portion thereof. The concentration of 
protein in the final preparation is adjusted, for example, by concentration on an Amicon filter device, to 
the level of a few micrograms/ml. Monoclonal or polyclonal antibody to the protein can then be 
prepared as follows: 

20 A. Monoclonal Antibody Production by Hybridoma Fusion 

Monoclonal antibody to epitopes in the BAP28 protein or a portion thereof can be prepared 
from murine hybridomas according to the classical method of Kohler, G. and Milstein, C, Nature 
256:495 (1975) or derivative methods thereof. Also see Harlow, E., and D. Lane. 1988. Antibodies A 
Laboratory Manual. Cold Spring Harbor Laboratory, pp. 53-242. 

25 Briefly, a mouse is repetitively inoculated with a few micrograms of the BAP28 protein or a 

portion thereof over a period of a few weeks. The mouse is then sacrificed, and the antibody producing 
cells of the spleen isolated. The spleen cells are fused by means of polyethylene glycol with mouse 
myeloma cells, and the excess unfused cells destroyed by growth of the system on selective media 
comprising aminopterin (HAT media). The successfully fused cells are diluted and aliquots of the 

30 dilution placed in wells of a microtiter plate where growth of the culture is continued. Antibody- 
producing clones are identified by detection of antibody in the supernatant fluid of the wells by 
immunoassay procedures, such as ELISA, as originally described by Engvall, (1980), and derivative 
methods thereof. Selected positive clones can be expanded and their monoclonal antibody product 
harvested for use. Detailed procedures for monoclonal antibody production are described in Davis, L. et 

35 al. Basic Methods in Molecular Biology Elsevier, New York. Section 21-2. 
B. Polyclonal Antibody Production by Immunization 
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Polyclonal antiserum containing antibodies to heterogeneous epitopes in the BAP28 protein or a 
portion thereof can be prepared by immunizing suitable non-human animal with the BAP28 protein or a 
portion thereof, which can be unmodified or modified to enhance immunogenicity. A suitable non- 
human animal is preferably a non-human mammal is selected, usually a mouse, rat, rabbit, goat, or 
5 horse. Alternatively, a crude preparation which has been enriched for BAP28 concentration can be 
used to generate antibodies. Such proteins, fragments or preparations are introduced into the non- 
human mammal in the presence of an appropriate adjuvant (e.g. aluminum hydroxide, RIBI, etc.) 
which is known in the art. In addition the protein, fragment or preparation can be pretreated with an 
agent which will increase antigenicity, such agents are known in the art and include, for example, 
1 0 methylated bovine serum albumin (mBSA), bovine serum albumin (BSA), Hepatitis B surface 
antigen, and keyhole limpet hemocyanin (KLH). Serum from the immunized animal is collected, 
treated and tested according to known procedures. If the serum contains polyclonal antibodies to 
undesired epitopes, the polyclonal antibodies can be purified by immunoaffinity chromatography. 
Effective polyclonal antibody production is affected by many factors related both to the 
1 5 antigen and the host species. Also, host animals vary in response to site of inoculations and dose, 
with both inadequate or excessive doses of antigen resulting in low titer antisera. Small doses (ng 
level) of antigen administered at multiple intradermal sites appears to be most reliable. Techniques 
for producing and processing polyclonal antisera are known in the art, see for example, Mayer and 
Walker (1987). An effective immunization protocol for rabbits can be found in Vaitukaitis, J. et al. 
20 J. Clin. Endocrinol. Metab. 33:988-991 (1971). 

Booster injections can be given at regular intervals, and antiserum harvested when antibody titer 
thereof, as determined semi-quantitatively, for example, by double immunodiffusion in agar against 
known concentrations of the antigen, begins to fall. See, for example, Ouchterlony, O. et al., (1973). 
Plateau concentration of antibody is usually in the range of 0.1 to 0.2 mg/ml of serum (about 12 uM). 
2 5 Affinity of the antisera for the antigen is determined by preparing competitive binding curves, as 

described, for example, by Fisher, D., Chap. 42 in: Manual of Clinical Immunology, 2d Ed. (Rose and 
Friedman, Eds.) Amer. Soc. For Microbiol., Washington, D.C. (1980). 

Antibody preparations prepared according to either the monoclonal or the polyclonal protocol 
are useful in quantitative immunoassays which determine concentrations of antigen-bearing substances 
30 in biological samples; they are also used semi-quantitatively or qualitatively to identify the presence of 
antigen in a biological sample. The antibodies may also be used in therapeutic compositions for killing 
cells expressing the protein or reducing the levels of the protein in the body. 

Example 7 

Tissular specificity of the BAP28 expression. 

35 Synthesis of the cDNA 

The mRNA used are human RNA from CLONTECH. 



149 



GEN SET. 063 AUS PATENT 

11.5 ul water treated with DEPC (diethyl pyrocarbonate) with 1 ul of human RNA (1 
ug/ul) and 1 (il of oligo dT primer random (oligo dT hexamer) (20pmol/uT) were heated at 74°C for 
2 min 30 s. Then the enzymatic mixture was added. The enzymatic mixture comprised 4uL 5X 
Reaction Buffer, luL dNTP lOmM each, 0.5uL Recombinant RNase Inibitor 40U/uL and luL 
5 MMLV Reverse Transcriptase 200U/uL. The sample was heated 1 h at 42°C, and 5 min at 94°C. 
Then 80 ul of water treated with DEPC were added, (kit Advantage RT-for-PCR.CLONTECH 
Kl 402-2) The synthezised cDNAs were stocked at - 20°C. 
Amplification of the BAP28 amplicon 

The cDNAs used in this experiment come from the cDNA preparation described above and 
1 0 from Marathon Ready cDNA from CLONTECH. 

For each tissue, the following PCR reactions were done. 

* First PCR reaction : The couple of primers used in this PCR was PCTAexALF12 (SEQ 
ID No 36)/ BAP283Ra6283 (SEQ ID No 32). There were located in exon A' and exon 43 of the 
BAP28 gene, respectively. 
1 5 The PCR assay was performed using the following protocol : 



Final volume 


50 ul 


Water 


19.8 uE 


Buffer 3. 3X 


15 uL 


Mix dNTP (25mM each) 


4 uL 


rttHXL PERKTN ELMER (2U/uL) 


1 uL 


Primer PCTAexALF12 (20pmoI/uL) 


1 uL 


Primer BAP283Ra6283 (20pmol/uL) 


1 uL 


cDNA 


6 uL 



After 3 min of denaturation, 2.2 ul of Mg(OAc) 2 25 mM were added. The PCR was 
25 proceeded with 10 min at 94°C; 34 cycles of 30 sec at 94°C, and 3 min at 67°C; and 1 0 min at 72°C. 
* Second PCR reaction (Nested PCR) : The couple of primers used in this PCR was 
PCTAexALF13n (SEQ ID No 37)/ BAP283Ra6324n (SEQ ID No 33). There were also located in 
exon A' and exon 43 of the BAP28 gene, respectively, and they were more dowstream than the first 
couple of primers. 

30 The PCR assay was performed using the following protocol : 



Final volume 50 ul 

Water 20.8 uL 

Buffer 3. 3 X 15 uL 

Mix dNTP (25mM each) 4 uL 

35 rttHXL PERKIN ELMER (2U/uL) 1 uE 



Primer PCTAexALF 1 3n (20pmol/uL) 1 uL 
Primer BAP283Ra6324n (20pmol/uL) 1 uL 
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Product of PCRN°1 5 uL 

After 3 min of denaturation, 2.2 (il of Mg(OAc) 2 25 mM were added. The PCR was 
proceeded with 10 min at 94°C; 34 cycles of 30 sec at 94°C, and 3 min at 67°C; and 10 min at 72°C. 

The PCR products of the second PCR were analyzed on a 1% TAE1X gel. 
5 The results are shown in Figure 13. The segment comprising the exons 43 to A has been 

observed in the following tissues : Marathon testis, Marathon hippocampus, Marathon leukemia 
(chronic myelogenous K-562), cDNA cerebellum, cDNA substantia nigra, cDNA thalamus, cDNA 
caudate nucleus, cDNA spinal cord, cDNA pitiutary gland and cDNA mammary gland. 

In contract, this cDNA segment has not been observed in Marathon Brain, Marathon 
10 Cerebellum, Marathon Cerebral Cortex, Marathon Hypothalamus, Marathon Fetal Kidney, Marathon 
Thyroid, Marathon Bone Marrow, Marathon HL60, Marathon MOLT4, Marathon Fetal Liver, 
Marathon Stomach, Marathon Prostate, cDNA Testis, cDNA Corpus Callosum, cDNA Amygdala, 
cDNA Fetal Brain, cDNA Skeletal Muscle, cDNA Lung, cDNA Kidney, cDNA Placenta, cDNA 
Spleen, cDNA Fetal Liver, cDNA Thyroid Gland, cDNA MOLT4, cDNA Adrenal Gland, cDNA 
15 Trachea, cDNA Salivary Gland, cDNA HL60, cDNA Small Intestine, cDNA Pancreas, cDNA 
Stomach, cDNA Bone Marrow, cDNA Thymus, cDNA Uterus, and cDNA Prostate. 

An additional analysis of the expression pattern in the tissue has been done by the search 
of ESTs in Genbank database which show homology with the BAP28 cDNA. The results are shown 
in Table 6. 
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Table 6 



Tissue 


Accession number in Genbank 


placenta 


AK001857; AI277866 


colon 


AW858897 ; AW858960 


colon tumor metastasis 


AW962967 


HeLa cell 


AA098827 


Adipose tissue white 


AA320776 


LNCAP cells 


AA3 57743 


Total fetus 


AA424101 ;AA460031 ; AA992680 


germinal center B cell 


AA814857 ; AA814859 


testis 


AI023607 ; AL040338 ; AA437086 


Fetal liver spleen 


AI033328 


Fetal liver 


All 14709 


Fetal heart 


AI150773 


lung 


AI348668 ; AW450486 


kidney 


AI582623 


colon tumor 


AI738790 


pooled fetal lung testis B-cell 


AI827817 


stomach 


AW3 89900 


Multiple sclerosis 


N77431 


fetal liver spleen 


T85649 


anaplastic oligodendroglioma 
Organ: brain 


AI356180 


breast 


AI905672 
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Example 8 
Cloning of a BAP28 cDNA. 

We cloned the BAP28 cDNA consisting to the exons 1 to 45. 
Synthesis of cDNAs 

5 mRNAs were total human prostate RNA from CLONTECH (Lot N°8040072 - Ref 

Cat:64038). 

1 1 .5 uL water treated with DEPC with 1 uL Total Human Prostate RNA (1 ug/uL) and 1 
uL primer oligodT BAP28polyTcourt (20pmol/ul) (tttttrttrttttttgtata : SEQ ID No 57) were heated 2 
min 30 sec at 74°C. Then the enzymatic mixture was added. The enzymatic mixture comprised 4 uL 
10 5X Reaction Buffer, 1 uL mix dNTPlOmM each, 0.5 uL Recombinant RNase Inibitor 40U/uL and 1 
uL MMLV Reverse Transcriptase 200U/uL. The sample was heated 1 h at 42°C and 5 min at 94°C. 
Then, 80ul water treated with DEPC were added. The obtained cDNAs were stocked -20°C. 
Am plification of the BAP28 s e gment to be cloned : (Double PCR Reaction) 
A first PCR with a couple of primer BAP281LF12.1 (SEQ ID No 58) / BAP28LR6726.1 
1 5 (SEQ ID No 59) was performed using the following protocol : 
Final volume 50 ul 

Water 198 ^ L 

Buffer 3. 3X 15 
Mix dNTP (25mM each) 4 uL 

20 rttHXL PERKTN ELMER (2U/uL) 1 uL 

Primer BAP281LF12.1 (20pmol/uL) 1 \ih 
Primer BAP28LR6726.1 (20pmol/uL) 1 uL 
Preparation of cDNA 6 uL 

After 3 min of denaturation, 2.2 ul of Mg(OAc) 2 25 mM were added. The PCR was 
25 proceeded with 10 min at 94°C; 34 cycles of 30 sec at 94°C, and 8 min at 67°C; and 10 min at 72°C. 

A second PCR reaction (Nested PCR) with a couple of primers BAP28LF26SalI (SEQ ID 
No 60) / BAP28LR6717SalI (SEQ ID No 61) was performed using the following protocol : 
Final volume 50 ^ 

Water 183 ^ L 

30 Buffer 3. 3X 15 ^ L 

Mix dNTP (25mM each) 4 u.L 

VENT BIOLABS (2 U/uL) 3 .5 uL 

Primer BAP281LF 1 2. 1 (20pmol/uL) 1 uL 
Primer BAP28LR6726.1 (20pmol/uL) 1 uL 
3 5 Product of PCR N° 1 5 uL 

After 3 min of denaturation, 2.2 ul of Mg(OAc) 2 25 mM were added. The PCR was 
proceeded with 1 0 min at 94°C; 34 cycles of 30 sec at 94°C, and 8 min at 67°C; and 10 min at 72°C. 
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As soon as the end of PCR, the phenol/chloform extraction was performed in order to 

avoid in degradation. Finally, the PCR product was precipitated with NaCl and ethanol. 

The PCR product and the cloning vector pGEMl lZf(+) were both digested by the 

restriction endonuclease Sail. The digested vector was then dephosphorylated. The digested PCR 
5 product was ligated with the digested and dephosphorylated pGEMl lZf(+) vector. E.coli DH1 OB 

was transformed by the obtained vector and the bacteria containing the recombinant vector were 

selected. The positive clones contained an 6.8 kb insert which is the expected size for the entire 

BAP28 cDNA. The sequencing of the insert showed a cDNA consisting of the exons 1 to 45 of 

BAP28. 

10 Example 9 

Natural antisense structure. 

The natural antisense structure observed in the BAP28 gene related to the PCTA-1 gene is 
conserved in the Drosophila. Indeed, the new CDS generated from the Genbank sequence AE00315 
(gene CG10805) is located between the positions 97601 and 104127 of the sequence. Another CDS 
1 5 is described on the opposite strand as the gene CGI 0806. This CDS is located between the positions 
107695 and 104389 of the sequence. Then, the distance between the two CDS is about 262 bp. 
Therefore, as the 3'UTR of the 2 genes are likely overlapping, the new gene gene CGI 0805 is a 
natural antisense of the gene CGI 0806 and the natural antisense organization of BAP28 is conserved 
in Drosphila. 

20 

While the preferred embodiment of the invention has been illustrated and described, it will 
be appreciated that various changes can be made therein by the one skilled in the art without 
departing from the spirit and scope of the invention. 
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What is claimed is: 

1. A purified or isolated BAP28 nucleic acid comprising at least 12 contiguous nucleotides 
of the nucleotide sequence of SEQ ID No 1 , or the complements thereof, wherein said contiguous 
5 span comprises at least one of the following nucleotide positions of SEQ ID No 1 : 1-50357, 50499- 
50963, 51257-52147, 52299-53234, 53394-53553, 53689-53837, 53943-54028, 54198-54740, 
54896-55753, 55913-57385, 57495-58503, 58828-85946, 59355-85946, 86169-91228, and/or 91852 
to 97662. 

10 2. A purified or isolated nucleic acid encoding a BAP28 protein comprising at least 12 

consecutive nucleotides of a nucleotide sequence selected from the group consisting of SEQ ID Nos 
2 and 3 or the complement thereof, wherein said contiguous span comprises at least 1 of nucleotide 
positions 1 to 4995 of SEQ ID No 2 or 3. 

15 3 . An isolated, purified or recombinant polynucleotide consisting essentially of a 

contiguous span of 8 to 50 nucleotides of SEQ ID No 1 or the complement thereof, wherein a span 
includes a BAP28-re\ated biallelic marker. 

4. A purified or isolated nucleic acid according to claim 3, wherein said contiguous span 
20 comprises a BAP28-related biallelic marker selected from the group consisting of Al to A58, and the 

complements thereof. 

5. A purified or isolated nucleic acid according to claim 3, wherein said contiguous span 
comprises a BAP28-vela.ted biallelic marker selected from the group consisting of Al to A27, A34, 

25 A37 to A41, A43 to A49, A52, and A54 to A58, and the complements thereof. 

6. A polynucleotide according to claim 3, wherein said contiguous span is 18 to 35 
nucleotides in length and said biallelic marker is within 4 nucleotides of the center of said 
polynucleotide. 

30 

7. A polynucleotide according to claim 6, wherein said polynucleotide consists of said 
contiguous span and said contiguous span is 25 nucleotides in length and said biallelic marker is at 
the center of said polynucleotide. 
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8. A polynucleotide according to claim 7, wherein said polynucleotide consists essentially 
of a sequence selected from the following sequences: PI to P58, and the complementary sequences 
thereto. 

9. A polynucleotide according to claim 3, wherein the 3' end of said contiguous span is 
located at the 3' end of said polynucleotide and said biallelic marker is present at the 3' end of said 
polynucleotide. 

10. An isolated, purified, or recombinant polynucleotide consisting essentially of a 
contiguous span of 8 to 50 nucleotides of anyone of SEQ ID Nos 1, 2, or 3 or the complement 
thereof, wherein the 3' end of said contiguous span is located at the 3' end of said polynucleotide, 
and wherein the 3' end of said polynucleotide is located within 20 nucleotides upstream of a BAP28- 
related biallelic marker in said sequence. 

15 1 1 . A polynucleotide according to claim 10, wherein the 3' end of said polynucleotide is 

located 1 nucleotide upstream of a BAP28-related biallelic marker in said sequence. 

12. A polynucleotide according to claim 1 1 , wherein said polynucleotide consists 
essentially of a sequence selected from the following sequences: Dl to D58, and El to E58. 



5 



10 



20 



13. An isolated, purified, or recombinant polynucleotide consisting essentially of a 
sequence selected from the following sequences: Bl to B38 and CI to C38. 

14. An isolated, purified, or recombinant polynucleotide which encodes a polypeptide 
25 comprising a contiguous span of at least 6 amino acids of SEQ ID No 5, wherein said contiguous 

span includes: 

- at least 1 of the amino acid positions 1 to 1629 of the SEQ ID No 5; or, 

- an amino acid selected from the group consisting of an asparagine at the amino acid 
position 1694 of SEQ ID No 5, a valine at the amino acid position 1 854 of SEQ ID No 5, 

30 an asparagine at the amino acid position 1967 of SEQ ID No 5, a glutamic acid at the 

amino acid position 2017 of SEQ ID No 5, and an alanine at the amino acid position 2050 
of SEQ ID No 5. 

15. An isolated, purified, or recombinant polynucleotide comprising a sequence selected 
35 from the group consisting of SEQ ID Nos 4, and 9-13 and the complementary sequence thereto. 
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16. A polynucleotide according to any one of claims 1-3, 10, 13-15 attached to a solid 

support. 

17. An array of polynucleotides comprising at least one polynucleotide according to claim 

5 16. 

18. An array according to claim 17, wherein said array is addressable. 

19. A polynucleotide according to any one of claims 1-3, 10, 13-15 further comprising a 

10 label. 

20. A recombinant vector comprising a polynucleotide according to any one of claims 1-3, 
10, 13-15. 

15 2 1 . A host cell comprising a recombinant vector according to claim 20. 

22. A non-human host animal or mammal comprising a recombinant vector according to 

claim 20. 

20 23. A mammalian host cell comprising a BAP28 gene disrupted by homologous 

recombination with a knock out vector, comprising a polynucleotide according to any one of claims 
1-3 and 14. 

24. A non-human host mammal comprising a BAP28 gene disrupted by homologous 
25 recombination with a knock out vector, comprising a polynucleotide according to any one of claims 
1-3 and 14. 

25. A method of genotyping comprising determining the identity of a nucleotide at a 
BAP28 -related biallelic marker or the complement thereof in a biological sample. 

30 

26. A method according to claim 25, wherein said biological sample is derived from a 
single subject. 

27. A method according to claim 26, wherein the identity of the nucleotides at said biallelic 
35 marker is determined for both copies of said biallelic marker present in said individual's genome. 
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28. A method according to claim 25, wherein said biological sample is derived from 
multiple subjects. 



29. A method according to claim 25, further comprising amplifying a portion of said 
5 sequence comprising the biallelic marker prior to said determining step. 

30. A method according to claim 29, wherein said amplifying is performed by PCR. 

3 1 . A method according to claim 25, wherein said determining is performed by an assay 
10 selected from the group consisting of hybridization assay, a sequencing assay, a microsequencing 

assay, and an enzyme-based mismatch detection assay. 

32. A method of estimating the frequency of an allele of a BAP28-related biallelic marker 
in a population comprising: 

15 a) genotyping individuals from said population for said biallelic marker according to the 

method of claim 25; and 

b) determining the proportional representation of said biallelic marker in said population.. 

33. A method of detecting an association between a genotype and a trait, comprising the 

20 steps of: 

a) determining the frequency of at least one BAP28-related biallelic marker in trait positive 
population according to the method of claim 32; 

b) determining the frequency of at least one BAP28-re\ated biallelic marker in a control 
population according to the method of claim 32; and 

25 c) determining whether a statistically significant association exists between said genotype 

and said trait. 

34. A method of estimating the frequency of a haplotype for a set of biallelic markers in a 
population, comprising: 

30 a) genotyping at least one -R4P28-related biallelic marker according to claim 27 for each 

individual in said population; 

b) genotyping a second biallelic marker by determining the identity of the nucleotides at said 

second biallelic marker for both copies of said second biallelic marker present in the genome of each 

individual in said population; and 
35 c) applying a haplotype determination method to the identities of the nucleotides determined 

in steps a) and b) to obtain an estimate of said frequency. 
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35. A method according to claim 34, wherein said haplotype determination method is 
selected from the group consisting of asymmetric PCR amplification, double PCR amplification of 
specific alleles, the Clark algorithm, or an expectation-maximization algorithm. 

5 36. A method of detecting an association between a haplotype and a trait, comprising the 

steps of: 

a) estimating the frequency of at least one haplotype in a trait positive population 
according to the method of claim 34; 

b) estimating the frequency of said haplotype in a control population according to the 
10 method of claim 34; and 

c) determining whether a statistically significant association exists between said haplotype 
and said trait. 

37. A method according to claim 33, wherein said genotyping steps a) and b) are 
1 5 performed on a single pooled biological sample derived from each of said populations. 

38. A method according to claim 33, wherein said genotyping steps a) and b) performed 
separately on biological samples derived from each individual in said populations. 

20 39. A method according to either claim 33 or 36, wherein said trait is cancer. 

40. A method according to either claim 33 or 36, wherein said control population is a trait 
negative population. 

25 4 1 . A method according to either claim 33 or 36, wherein said case control population is a 

random population. 

42. A method of determining whether an individual is at risk of developing prostate cancer, 
comprising: 

30 a) genotyping at least one BAP 2 8-related biallelic marker according to the method of claim 

27; and 

b) correlating the result of step a) with a risk of developing prostate cancer. 

43. A method according to any one of claims 25, 32, 33, 34, 36, and 42 wherein said 
35 BAP28-re\ated biallelic marker is selected from the group consisting of Al to A58 and the 

complements thereof. 
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44. A method according to claim 42, wherein said BAP28-re\aXed biallelic marker is 
selected from the following list of biallelic markers: Al, A4, 16, A30, A31, A42, A50, A5 1, and 
A53, and the complements thereof. 

5 45. A diagnostic kit comprising a polynucleotide according to any one of claims 3-13 and 

16-19. 



46. An isolated, purified, or recombinant polypeptide comprising a contiguous span of at 
least 6 amino acids of SEQ ID No 5, wherein said contiguous span includes: 

10 - at least 1 of the amino acid positions 1 to 1629 of the SEQ ID No 5; or, 

an amino acid selected from the group consisting of an asparagine at the amino acid 
position 1 694 of SEQ ID No 5, a valine at the amino acid position 1 854 of SEQ ID No 5, 
an asparagine at the amino acid position 1967 of SEQ ID No 5, a glutamic acid at the 
amino acid position 2017 of SEQ ID No 5, and an alanine at the amino acid position 2050 

15 ofSEQIDNo5. 



47. An isolated or purified antibody composition are capable of selectively binding to an 
epitope-containing fragment of a polypeptide according to claim 46, wherein said epitope comprises: 

at least 1 of the amino acid positions 1 to 1629 of the SEQ ID No 5; or, 
20 - an amino acid selected from the group consisting of an asparagine at the amino acid 

position 1 694 of SEQ ID No 5, a valine at the amino acid position 1 854 of SEQ ID No 5, 
an asparagine at the amino acid position 1967 of SEQ ID No 5, a glutamic acid at the 
amino acid position 2017 of SEQ ID No 5, and an alanine at the amino acid position 2050 
ofSEQIDNo5. 

25 

48. A method for the screening of a candidate substance interacting with a BAP28 
polypeptide comprising the following steps : 

a) providing a polypeptide according to claim 46; 

b) obtaining a candidate substance; 

30 c) bringing into contact said polypeptide with said candidate substance; and 

d) detecting the complexes formed between said polypeptide and said candidate substance. 



49. A method for screening of a candidate substance that modulated the expression of the 
BAP28 gene comprising the following steps: 
35 a) providing a recombinant cell host containing a nucleic acid, wherein said nucleic acid 

comprises a nucleotide sequence of the 5' regulatory region (2996 -4996 of SEQ ID No 1) or a 
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biologically active fragment or variant thereof located upstream a polynucleotide encoding a 
detectable protein; 

- obtaining a candidate substance; and 

- determining the ability of the candidate substance to modulate the expression levels of 
5 the polynucleotide encoding the detectable protein. 

50. A computer readable medium having stored thereon a sequence selected from the 
group consisting of a nucleic acid code comprising one of the following: 

a) a contiguous span of at least 12 nucleotides of SEQ ID No 1, wherein said contiguous 
10 span comprises at least lof the following nucleotide positions of SEQ ID No 1: 1-50357, 50499- 

50963, 51257-52147, 52299-53234, 53394-53553, 53689-53837, 53943-54028, 54198-54740, 
54896-55753, 55913-57385, 57495-58503, 58828-85946, 59355-85946, 86169-91228, and/or 91852 
to 97662; 

b) a contiguous span of at least 12 nucleotides of SEQ ID No 1 or the complement thereof, 
1 5 wherein said contiguous span comprises at least 1 nucleotides selected from the group consisting of 

the following nucleotide positions of SEQ ID No 1: 4997-5076, 5371-5544, 6121-6337, 9877- 
10018, 11522-11623, 12521-12661, 13453-13664, 13824-13957, 15376-15478, 16855-16965, 
17378-17495, 18535-18642, 21446-21541, 21999-22087, 23036-23247, 23546-23667, 24270- 
24461, 26287-26470, 2661 1-26747, 28068-28260, 32540-32709, 331 12-33270, 34586-34828, 
20 35156-35287, 36660-36763, 36934-37077, 37803-37921, 38017-38138, 40365-40493, 42618- 
42848, 43452-43578, 44836-44999, 48223-48269, and 49656-49779; 

c) a contiguous span of at least 12 nucleotides of SEQ ID No 1 or the complements 
thereof, wherein said contiguous span comprises at least one BAP28-related biallelic marker selected 
from the group consisting of Al to A58; 

25 d) a contiguous span of at least 12 nucleotides of a nucleic acid sequence selected from the 

group consisting of SEQ ID Nos 2 and 3 or the complements thereof, wherein said contiguous span 
comprises at least 1 of nucleotide positions 1 to 4995 of SEQ ID No 2 or 3; 

e) a contiguous span of at least 12 nucleotides of a nucleic acid sequence selected from the 
group consisting of SEQ ID Nos 2 and 3 or the complements thereof, wherein said contiguous span 

30 comprises at least 1 of nucleotide positions 1 to 2033, 2160 to 2348 and 2676 to 4995 of SEQ ID No 
2 or 3; 

f) a contiguous span of at least 12 nucleotides of a nucleic acid sequence selected from the 
group consisting of SEQ ID Nos 1-3 or the complements thereof, wherein said contiguous span 
comprises at least 1 of any one of the following ranges of nucleotide positions of: 

35 (1) SEQ ID No 1: 1-2500,2501-5000,5001-7500,7501-10000, 10001-12500,12501- 

15000, 15001-17500, 17501-20000, 20001-22500, 22501-25000, 25001-27500, 27501-30000, 
30001-32500, 32501-35000, 35001-37500, 37501-40000, 40001-42500, 42501-45000, 45001- 
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47500, 47501-50000, 50001-50357, 50499-50963, 51257-52147, 52299-53234, 53394-53553, 
53689-53837, 53943-54028, 54198-54740, 54896-55753, 55913-57385, 57495-58503, 58828- 
85946, 59355-85946, 86169-91228, and/or 91852 to 97662; 

(2) SEQ ID No 2: 1 to 500, 501 to 1000, 1001 to 1500, 1501 to 2000, 2001 to 2500, 2501 

5 to 3000, 3001 to 3500, 3501 to 4000, 4001 to 4500, 4501 to 4995, 5000 to 5500, 5501 to 6000, 6001 
to 6500, and 6501 to 6782; and, 

(3) SEQ ID No 3: 1 to 500, 501 to 1000, 1001 to 1500, 1501 to 2000, 2001 to 2500, 2501 
to 3000, 3001 to 3500, 3501 to 4000, 4001 to 4500, 4501 to 4995, 5000 to 5500, 5501 to 6000, 6001 
to 6500, 6501 to 7000, 7001 to 7500, 7501 to 7932; and 

10 g) a nucleotide sequence selected from the group consisting of SEQ ID Nos 4, and 9-13; 

and, 

h) a nucleotide sequence complementary to any one of the preceding nucleotide sequences. 

5 1 . A computer readable medium having stored thereon a sequence consisting of a 

1 5 polypeptide code comprising a contiguous span of at least 6 amino acids of SEQ ID No 5, wherein 
said contiguous span includes: 

at least 1 of the amino acid positions 1 to 1629 of the SEQ ID No 5; or, 

an amino acid selected from the group consisting of an asparagine at the amino acid 

position 1694 of SEQ ID No 5, a valine at the amino acid position 1854 of SEQ ID No 5, 

20 an asparagine at the amino acid position 1967 of SEQ ID No 5, a glutamic acid at the 

amino acid position 2017 of SEQ ID No 5, and an alanine at the amino acid position 2050 
of SEQ ID No 5. 

52. A computer system comprising a processor and a data storage device wherein said data 
25 storage device a computer readable medium according to with claim 50 or 5 1 . 

53. A computer system according to claim 52, further comprising a sequence comparer and 
a data storage device having reference sequences stored thereon. 

30 54. A computer system of Claim 53 wherein said sequence comparer comprises a computer 

program which indicates polymorphisms. 

55. A computer system of Claim 52 further comprising an identifier which identifies 
features in said sequence. 

35 

56. A method for comparing a first sequence to a reference sequence, comprising the steps 

of: 
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reading said first sequence and said reference sequence through use of a computer program 

which compares sequences; and 

determining differences between said first sequence and said reference sequence with said 

computer program, 

5 wherein said first sequence is selected from the group consisting of a nucleic acid code 

comprising one of the following: 

a) a contiguous span of at least 12 nucleotides of SEQ ID No 1, wherein said contiguous 
span comprises at least lof the following nucleotide positions of SEQ ID No 1: 1-50357, 50499- 
50963, 51257-52147, 52299-53234, 53394-53553, 53689-53837, 53943-54028, 54198-54740, 

10 54896-55753, 55913-57385, 57495-58503, 58828-85946, 59355-85946, 86169-91228, and/or 91852 
to 97662; 

b) a contiguous span of at least 12 nucleotides of SEQ ID No 1 or the complement thereof, 
wherein said contiguous span comprises at least 1 nucleotides selected from the group consisting of 
the following nucleotide positions of SEQ ID No 1: 4997-5076, 5371-5544, 6121-6337, 9877- 

15 10018, 11522-11623, 12521-12661, 13453-13664, 13824-13957, 15376-15478, 16855-16965, 
17378-17495, 18535-18642, 21446-21541, 21999-22087, 23036-23247, 23546-23667, 24270- 
24461, 26287-26470, 26611-26747, 28068-28260, 32540-32709, 33112-33270, 34586-34828, 
35156-35287, 36660-36763, 36934-37077, 37803-37921, 38017-38138, 40365-40493, 42618- 
42848, 43452-43578, 44836-44999, 48223-48269, and 49656-49779; 

20 c) a contiguous span of at least 12 nucleotides of SEQ ID No 1 or the complements 

thereof, wherein said contiguous span comprises at least one BAP28-related biallelic marker selected 
from the group consisting of Al to A58; 

d) a contiguous span of at least 12 nucleotides of a nucleic acid sequence selected from the 
group consisting of SEQ ID Nos 2 and 3 or the complements thereof, wherein said contiguous span 

25 comprises at least 1 of nucleotide positions 1 to 4995 of SEQ ID No 2 or 3; 

e) a contiguous span of at least 12 nucleotides of a nucleic acid sequence selected from the 
group consisting of SEQ ID Nos 2 and 3 or the complements thereof, wherein said contiguous span 
comprises at least 1 of nucleotide positions 1 to 2033, 2160 to 2348 and 2676 to 4995 of SEQ ID No 
2 or 3; 

30 f) a contiguous span of at least 12 nucleotides of a nucleic acid sequence selected from the 

group consisting of SEQ ID Nos 1-3 or the complements thereof, wherein said contiguous span 
comprises at least 1 of any one of the following ranges of nucleotide positions of: 

(1) SEQ ID No 1: 1-2500, 2501-5000,5001-7500, 7501-10000, 10001-12500, 12501- 
15000, 15001-17500, 17501-20000, 20001-22500, 22501-25000, 25001-27500, 27501-30000, 

35 30001-32500, 32501-35000, 35001-37500, 37501-40000, 40001-42500, 42501-45000, 45001- 
47500, 47501-50000, 50001-50357, 50499-50963, 51257-52147, 52299-53234, 53394-53553, 
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53689-53837, 53943-54028, 54198-54740, 54896-55753, 55913-57385, 57495-58503, 58828- 
85946, 59355-85946, 86169-91228, and/or 91852 to 97662; 

(2) SEQIDNo2: 1 to 500, 501 to 1000, 1001 to 1500, 1501 to 2000, 2001 to 2500, 2501 
to 3000, 3001 to 3500, 3501 to 4000, 4001 to 4500, 4501 to 4995, 5000 to 5500, 5501 to 6000, 6001 

5 to 6500, and 6501 to 6782; and, 

(3) SEQ ID No 3: 1 to 500, 501 to 1000, 1001 to 1500, 1501 to 2000, 2001 to 2500, 2501 
to 3000, 3001 to 3500, 3501 to 4000, 4001 to 4500, 4501 to 4995, 5000 to 5500, 5501 to 6000, 6001 
to 6500, 6501 to 7000, 7001 to 7500, 7501 to 7932; and 

g) a nucleotide sequence selected from the group consisting of SEQ ID Nos 4, and 9-13; 

10 and, 

h) a nucleotide sequence complementary to any one of the preceding nucleotide sequences; and 
a polypeptide code comprising a contiguous span of at least 6 amino acids of SEQ ID No 
5, wherein said contiguous span includes: 

- at least 1 of the amino acid positions 1 to 1629 of the SEQ ID No 5; or, 
15 - an amino acid selected from the group consisting of an asparagine at the amino acid 

position 1 694 of SEQ ID No 5, a valine at the amino acid position 1 854 of SEQ ID No 5, 
an asparagine at the amino acid position 1967 of SEQ ID No 5, a glutamic acid at the 
amino acid position 2017 of SEQ ID No 5, and an alanine at the amino acid position 2050 
of SEQ ID No 5. 

20 

57. Use of a polynucleotide comprising a contiguous span of at least 12 nucleotides of the 
SEQ ID No 1 or the complementary sequence thereto for determining the identity of the nucleotide 
at a BAP28-related biallelic marker 

25 58. Use according to claim 57 in a microsequencing assay, wherein the 3 ' end of said 

contiguous span is located at the 3' end of said polynucleotide and wherein the 3' end of said 
polynucleotide is located 1 nucleotide upstream of said BAP28-related biallelic marker in said 
sequence. 

30 59. Use of according to claim 57 in a hybridization assay, wherein said span includes said 

PG1 -related biallelic marker. 

60. Use according to claim 57 in a specific amplification assay, wherein the 3' end of said 
contiguous span is located at the 3' end of said polynucleotide and said biallelic marker is present at 
35 the 3' end of said polynucleotide. 
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61. Use according to claim 57 in a sequencing assay, wherein the 3' end of said contiguous 
span is located at the 3' end of said polynucleotide. 



62. Use according to claim 57, wherein said BAP28-related biallelic is a biallelic marker 
5 selected from the group consisting of A 1 to A58, and the complements thereof. 

63. Use according to claim 57, wherein said BAP28-related biallelic is a biallelic marker 
selected from the group consisting of Al to A27, A34, A37 to A41, A43 to A49, A52, and A54 to 
A58, and the complements thereof. 

10 

64. Use according to claim 57, wherein said BAP28-related biallelic is a biallelic marker 
selected from the group consisting of Al, A4, 16, A30, A31, A42, A50, A51, and A53, and the 
complements thereof. 
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y ABSTRACT 
The present invention is directed to BAP28 polypeptides, BAP28 cDNA sequences 
encoding BAP28 polypeptides, to the genomic DNA sequence of the BAP28 gene as well as to 
regulatory regions located at the 5'- and 3'-ends of the BAP28 coding region. The invention also 
5 deals with antibodies directed specifically against such polypeptides that are useful as diagnostic 
reagents. The invention further encompasses biallelic markers of the BAP28 gene useful in genetic 
analysis. The invention concerns an association of the BAP28-re\ate& biallelic markers with prostate 
cancer. Therefore, the invention contemplates the diagnostic and treatment methods of prostate 
cancer. 

10 
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Figure 3 

H -MTSLAQQLQRLALPQSDASLLSRD EVASLLFDPKEAATIDRDTAFAIGCTGLEEL 

D MSTALAQQLQKLAAPQSSVTLADAR SRASILFDPKEAATKDRRSIYEIGLTGLQEL 

A MSSSIVSQLQALKSVLQADTEPSKRP — FTRPSILFSPKEAADFDIESIYELGLKGLEVL 

S MASSLQKQLKNIQSNNVLKINKIRR APSLLYDPKVAADMDLEE1YVTAVSGFHEL 

y -MS S L S DQLAQVASNNAT VALDRKRRQKLHS ASL I YNS KTAATQD YDFI FENAS KALEEL 
C MATSLTSQLENLRTSAARHLTVEKR HVSLLFDRKEANKLSNETAHRIGVAGLEQM 



H LGIDPSFE-QFEAPLFSQLAKTLERSVQTKAVNKQIDENISLFLIHIjSPYFLLKPAQKCL 

D tdfnpafk-efqltlfdeatltlersvelpeinkmldaaiakflrllspylllrpahmaf 

A GNKDERFK-NyMNDLFSHKSKEIDRELLGKEENARIDSSISSYLRLLSGYLQFRASLETL 

S AVHEPRLL-YFEKTLLGEQS VQVDRVLLNRTENEKI DLECVQILRLLAPFFTEKNALKVL 

Y S Q I EPKFA- 1 FS RTLFS E S S I SLDRNVQT KEE IKDLDNAI NAYLLLAS S KWYLAPTLHAT 

C KRiDPVFDTEFANDLFSEERVDFVRSMI»EKGANEELNKOIEKLLLELSPYLQHFACQQVL 



H EWLIHRFHIHLYNQ-DSLIACVLPYHETRIFVRVTQLLKINNSKHR-MFWLLPVKQSGVPL 

D EWLLRRFQVHE YNRSEVMAL I LPYHETMI FVQIVKTMRLRSSDGD-WYWLRPLQRPGVPL 

A EYLIRRYKIHIYNLEDWLCALPYHDTHAFVRIVQLLSTGNSK WKFLDGVKNSGAPP 

S EWL IRRFS I HE YVS DE FI L SFL P FH DHP FFARI LGCSKPKSRP LLFLEtfAIKMPVTL 

Y EWL VRRFQX H VKN T EMLLL S TLN Y YQT PVFKRI L S 1 1 KL PPL F NCLSNFVRSEKPP 

C EFLIHTYQIYSFNAETLLLTFLPFHETKVYSRLLRILDFDWKRSKEWQFMQQFTKTETPI 



H AKGTLITHCYK-DLGFMDFICSLVTKSVKVFAEYPGSSAQLRVLLAFYASTIVSALVAAE 

D AKTAIINRAAS-NPAFLGFICQSTQKAVKELGPR AHQLQAQINFYATWVGALQTAK 

A PR5VIVQQCIR-DKQVLEALCDYASR-TKKYQPS KPV-VSFSTAWVGVLGSVP 

S SRADIVHALSR-DKEFFAMFAQFVQNTAESHNMY PELARFWAGTMMEVLVAWH 

Y TALTMIKLFN — DMDFLKLYTSYLDQCIKHNATY TNQLLFTTCCFINWAFNS 

C PFTSIARATLSSKHSIITCITDHIRHAVEIVGSD-YLEIKHPILFNFHAKLLLSMFTDPE 



H D-VSDNIIAKLFPYIQKGLKS SL FDYRAATYMI I CQI SVKVTMENT FVNSLASQI I K 

D P-LQDWHITTILESLLRGLIS DNIDFMAAAYVIVAQLVSRTKLKSKVCNALLERVAN 

A T — VD GD I VKT I L P FVD S GL Q S GVKG CL DQQAG ALMWGMLANRAVL NT NL I KRLM RS II D 

S S5NEDPNVLLDRFFLRVSYAVSYVSSIDFQIAGFMLLSSIAASLPLSPSIIPPLVSAITD 

Y N-UDEKLNQLVPILLEISAKLLASKSKDCQIAAHTILWFATALPLKKTI ILAAMETILS 

C K- VDEMMLAKLMPF I ENGlKS PMKS FRYS AMWI S QLVLT VKLKDE VLN SMCKLL I T 



H T-LTKIPStlKDGLSCLIVLLQRQKPESLGKKPFPHLCNVPDLrTILHGISE-TYDVSPL 

D CPFERLHSESLLLLVCIYGKQQAALP-HFKPETILNLVGKKWLISTI.SSLAKGNIAIQSI 

A I — GREHAKE SSDP-HSLRLSLMALINFVQLQSVDLIPRK 

S R LS FDN IKP ALI CVGHLLQFCSSFEFDHEOLE 

Y NL DAKEAKHS ALLT I CKL FQTLKGQGNVDQL P SKI FKLFD 

C K MRSDT AAASLSTLMWFQQQNVQSLSKN 



H LRYML PHLWS 1 1 HHVT G EETEGMDGQIYKRHLEAILTKISLKNNLDHLLAS-LLFEE 

D CMPLMTGAVAAIRDDDASSNSCKLFLDNliLSEVPMPKPTAQQLINCFLDTYVETAIDAPE 

A ALDLFNEISSSDDK CCEVLASIIETVP VSNLVDHLISK-VFSLC 

S K LE5FGASSLLIELS QEHRLDEFFVS YW VS-- LIKS-RKQKD 

Y SKFDTVSILTFXDK EDKPVCDKFXTSYT RS IARYDRS KLNI 

C TLKKLLRHEEG — IDVWKILKELSERT DT TKFFNVLWKE 
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Fignre 3 (following) 

H Y I S YSSQEE MD SN-KVSLLNE QFLPLI RLLESK- YPRTL D WLEEHLKE I AD- -L 

D PMETNSNEDDDTIVIDSDDEIETEKTTFQAWYSTVLEK-LERRYPEAFDLSVKEALR — s 

A MTQYQKNSD FRSS TSGSWAKKFLVWSKK-YPAELRAAVPKFLEATEVQ-S 

5 KKRLISLLD TSIS-QlRVTHEQAKFIiLSVIPVN-QDFKALQSYRRILDSVIQP-E 

Y ILSLLKKIR LERY-EVRLIITDLIYLSEILEDKSQLVELFEYFISINEDtVLK-C 

C LIVLSKDAES EDNTLASDVLIETIEDASILTGDQ-AGTILKLILQEGMDGNIFDNK 



H KKQELFHQFVSLSTSGGKYQFLADSDTSLMLSLNHPLAPVRILAMNHLKKIMKTSKEG-V 

D KSSTSNRQKALKLALGFRLNTTDEKAKHAYEKLYHYSADWRLSAVQKLLQNLNVTKKRER 

A KKEDLKLEMLSCMLDGNSDMSHPFVDSKLWFRLHHPRAAVRCAALSSLNGVLKDDSSKAE 

S RKEGKLDNLINTLQDKKKSSTFSKKDREVLLKKIS EIDSQTSFEQCLAYAD-SAAD 

Y LKSLGLTGELFEIR1TTSLFTNADVNTDIVKQLSDPVETTKKDTASFQTFLDKHSELINT 

C KKLKSNIRAIGMRFAKQFDAIHAELKAKDKKTI.KNVIKEYQIEDIVQFASEAVAATQSEE 



H DESFIKEAVLARLGDDNIDVVLSAISA-FEIFKEHFSSEVTISNLLNLFORAEI.SKNGEW 

D SVKLLQECLPDRINDDSGAWSTLIjSLPTEELAEMLGPIjPLAQTIiCHLLYRAQSEKDEEW 

A NLVTIQDAILRQLWDDD1AWQAALSF — DKL PNI I T S S GLLDALLHWKRC VG I LVS G V 

S LDSSVFISLLSKFG-DKIPFLLFCIAN GSERI I ILSLI ELRKT I EENKDVDY 

Y TNVSMLTETGERYK-KVLSLFTEAIGK G — YKAS S FLT S FFTTLES RI T FLLRVT I 

C S IEIISEEAPSSKK— IKLTASEKAQKL — AQ SSEFAKREVFSGDPINKATEWLNGEKW 



H YEVLKIAADILIKEEILSENDQLSNQ\AA^CLLPFWliaNDDTESAEMKIAIYI.SKSGICS 

D Q PWPLAVRHLT SALVSG5YD — TNLVLLALMPLLFPGEALAEHQHKALRI &LG - S DFVS 

A SHNVQLAVDWALSLKIAVSSFGNQTDSTEKVTSAMFPFLLIQPKTWNLNLLVLKLGKDV 

S QI ILPWLYSLQSKDTEVRSR ALNLILTFLELRN ENLEFSIIYG 

Y SPAAPTALKLISLNNIAKYIN — S — IEKEVNIFTLVPCLICALRDAS IKVRTG 

C DKVEWAXNEMAQRGEKYFSRK VEDDVEQFVLEIVKWG VGGVKQIDG 



H LHPLLRGWEEALENVIKSTKPGKLIGVANQKMIELLAD— NINLGDPS— SMLKMVEDLISV 

D KVPFLA — ELKVSNKFSDFN VGEHRQHFLDIIASSNQELSSQERALLQSVEDHG — 

A NWPLFK — NLAADDGMKKLP DIMSTNLSSISMDIINDLG EALSLDPDER — 

S MD DNDNKNLR WLSPVETKYYCSD LLLD 

Y VK KILSLIAKRP STKHYFLSDKLYGENVTIP MLN 

C GS VKAALAGAN 1NPQFVADLLTK— FDGVS 



H GEE5SFNLKQKVTFHVILSVLVSCCSS-LKETHFPFAIRVFSLLQKKIKKLESVITAVEI 

D GELYIQKASQLTHLLLLLTAYAKRELQPRESLHMLEKIGLYSRRLQFRWNGSQNTQNCA 

A RIELXERACNYKLSEVLETCSNIKCSE QDRNKLQKGLLIRESVSALNIDVINKLVEA 

S RSSEI GiDGT YL FS YIPERL FTEKKPK NAS KE IAVT S FLS SHAACSKL SNVRVXL 

Y PKDSEAWLSGFLNEYVTENYDISRILT PKRNEKVFLMFWANQAIiLI PSPYAKTVL 

C EIAPKRTKGAQKKNLVEKTFGTEESWE AFNQRWFVLDIiLNARQIIPSSEKVIiAA 



H PSEWHIELMLDRGI PVELWAHYVEELNSTQRVAVEDSVFLVFSLK-KFIYALKAPKSFPK 

D PI.QLYVDFLLT— LVKNTKWT ALASTPWNQMTDELRLCLR1L-EI ICAQVFSEKADQ 

A FMMH-PADYIQWL TTEWEELEVEVDVSLKE1SK5NCOELLYQLLDT 

S LEILTRV HGKVEDAKMQILLPRL — EQLSEFNSEKFKT 

Y LDNLNKS PTYASSYSSLFEEFISHYLENRSSWEKSCIANK 

c l FAWKQ VN SKS DVE S S S YQQHLAVN- AI RK I LEHPEKTKI 
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Figure 3 (following) 

H GDIWWNPEQLKEDSRDYLHLLIGLFEMMLNGADAVHFRVLMKLFIKVHLEDVFQLFKFCS 

D PERQ-EWTRALQQSLQLILPEAQ— -D RLEVLSNFYVFERLP 

A SDFTALNSKDVKAAAINCIEALFN LRAA IYGSSFDE 

S VSKREVEALVNCFNHTS FTSLLSFLSSNI 

Y TNFEHFERSLVNLVSPKE KQSFMIDFVLSAX.NS 

C GASEVOMDCVIETM RSTHNHH 

H VLWTYGSSLSNPLNCSVKTVLQTQALYVGCAMLSSQKTQCKHQLASI5SPVVTSLLINLG 

D ELWPRDSDYA VFRLQGFIILEAVLSNPKSQIDCGLVHVLR VANACG 

A LLG MIVQQRRLILSDNKFFA — SYLTSLLSSTTN DLLVPVG 

S VLS ' QAI CRRI VEIQS HLKD — PQRLE FVKAVI S QDEQ 

Y DYEQ-LA NIAAERLISIFASLNN — AQKLKIVQNIVD SSSNVES 

C LLR DCLRLIVAAAKHTP — NSWKHVMSVFT FMGNG 

H SPVKEVRRAAIQCLQALSG — VASPFYLI IDHLISKAEEITSD-AAYVIQDLATLFEELQ 

D SPLQTLRVQAINILQLISNRKLVSHVEQLVRSLLQRKSELSMDHEQYALILYTILEPEKA 

A LQKRFDQSTKENILSVILLCAEDLPAYGK1RVLSLLKDLGIMLMRDEIVKLLSQLLDK 

S PHYYVDVLDSIKIFDTVFK KLIGSVRLVKEKNPAIAKR -KRIDSHIFDG — 

Y SYDTVGVLQ5LFLDSDIFVS 1 LNQN S I SNEMDQTDFSKRR- RRRSST S KNAFLKEEV 

C MLRKDNEliTLSIVEKTVES LFSTIINSSGQAVLTKQQ-QTEKLIELARLFAASA 

H REKKLKSHQKLSETLKNLLSCVYSCPSYIAKDLMKVLQGVNGEMVLSQLLPMAEQLLEKI 

D TAKERLV1SKLKRSV1ALA5DPKQSP-ICTASLLAALKHVNDENFLNELLPLG1DSLKTI 

A RSQYYYKLDKTSQPLSDTEVDLLCLLLECSMMRTSSFKGQS LDDHILSALNVDCMA 

S DVQRLTRILELLETKNAASYFKLASPLFEVLNSVIA LKEDIVS SNYLLQLL 

Y SQLAELHLRKLTI ILEALDKVRNVGSEKLLFTLLSLLSDLET LDQDGGL PVL YAQET 

C I DIPAHRRARIAQAIARAVQAENAST — WLVLVSSFCARWQ RSSDAAAQEAMKRGS 

H QK-EPTAVLKDEAMVLHLTLGKYNE-FSVSLLNEDPKSLDIFIKAVHTTKELY-AGMPTI 

D TAGEDNQNIKQLFWPHSEIYKSVIERFEGRVALNVLLRKDLAWKLFEDSFAQY-DTYVQL 

A S E-RPAVI 5 FCL.T I LEKL SNRFYDE LQT DVQIRFFHKLVSMFRSSNGS I 

S LG LLYEMIGASPITELSP SIRIDTLVGCIRST — NNPQI 

Y LI SCTLNTITYLKEHGCTE3LTN VRADILVSAIRNS — ASPQV 

C DQ DAYDDLAIELLSALNP FEQLS S VLEMCE YVRRIiGG DK 

K QI TALEKI TKPFFAAI SDEKVQQKLLRML FDLLVNCKNSHCAQT VS SVFKGI S -VNAEQV 

D EQ-KLQPIiPCVLLNSLTPETFEQMHAKHKIAIiIKLIVESATNSDNDSIFIiASH— RLLKRC 

A QNGA^AVLRLKLSSSTVVLALDRITQQOTLVIGSLSKKKKQKKNSKSCPEED-INSEEF 

S QN — KAXLLVSALANAAPEAVLHGVMPIFTFMGSTVL5RDDAFSIHVIEQTVKTVISA1I 

Y QN KLLLVlGSLATIiSSEVILHSVMPIFTFMGAHSIRQDDEFTTBCWERTlLTWPALI 

C PA — KSTTTKKDLDTMIFDRTAQTLPRIRHFRYWVTLISRIFSNRVLIERLAAYDDEEL 

H RIELEPP- DKAKPL GTVQQKRRQK-MQQKKSQDLE SVQEVGGS- Y-WQRVTLILELLQHK 

D RLDCQP LVPILIiEMANTKVEK-KQPVKRRSVQATQLDLTSPY-WKQGMTLLELiEHK 

A RSGEKAL-SFIASZ.LDMLLLKKDLTHRESLIRPLFKLLQRSMSKE--WVKTAFSIEETSLQ 

S RLGKDF DS SLLVS CFVNAFPH I PQHRRLRL YRjLVXiQT I GS NRIXSWLIQFAE 

Y KNSKGNEKEEMEFLLLSFTTALQHVPRHRRVKLFSTLIKTLDPVKALGSFLFLIAQQYSS 
C LKNALP LGKRL I E C S VEL DE FANKEAN DQDGS D P QAQRYWVAFAS RT EWS EKLRHL 
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Figure 3 (following) 

H XKLRSPQILVPTLFNLL5RCLEPLPQEQGNMEYTKQLILSCLLNICQKLSFDGGKIPKDI 
D KQLVGAELLIFPLFELLQACLT — MEEHSAAEYPKQLILSSLLHCCQTAQSAGVQLVKAM 

A PPQ-DVRETTPTFI SSIQQTLL LILKDI FDSLNMN-PLKAEVANEI 

S KML L AKSTNWAIHDFCLT L — VQS F SVADRI 

Y ALVNFKI GEARI L IE F I KALLV ■ D LHVNEELS — G — LNDLL 

C LPG GVAARLI ADVLQE CVN DKK MSYKM 



H LDEEKFNVELIVQCIRlSEMPQTHHHALliLLGTVAGIFPDKVLHNIMSIFTFMGANVMRL 

D p-ESS FRI EL WQSLRNTRN PQTQQH ALLFLTHCAGMYPQQVLHKI VE I FT FVGSTVARH 

A N VKMLVELAHSSNDGVTRNHIFSLFTAIVKFVPDKVLDHIISILTLVGESTVTQ 

S G SIN-QCSRFCLKSLEEQSNSDSNGKAVSLIK1DELPMDVDLATLGSLRVKVL 

Y D IIKLLTSSK5SSEKKKSLESRVLFSNGVLNFSESEFLTFMNNTFEFIN-KITEE 

C C EKVLQLANIKLG HDRYLFA-DSGINEKELITLAQALNKFIVAETKSE 



H DDT YS FQVI NKTVKMVI PAL I QSDS GDS IEVS RNVEE I WKI I SVFVDAL PHVPEHRRLP 

D DDAFSLHI IHNWESIIFILLLN— TG HNELVI PVLKVFAD1CTDVPVHRRLP 

A I D5HSKS I FEGFI SMVI PFWL SK TKS EEQLLQI FVKVL PDI VEHRRRS 

S ELI SLVSKAKNFAFDLAKIMENS VDS FVE I QAGLFES IK 

Y TDQDYYDVRRNLRLKVYSVLLDETSD KKLIRNIREEFGTLLEGVLFF-INS 

C EKMRMCQNSAYTLKLIAKNLPSQ SESLVLADTMQR-CVS 



ILVQLVDTLGAEKFLWILLILLFEQYVTKTVLAAAYGEKDAILEADTEFWFSVCCEFSVQ 

1YATLFRVLEPKEHLWQFLCI I FES QVLLEQVPQKVSTDKSRLDFARELTLMFEDP 

IVAYLLGWTS LLQQ 

LLITLSQQSSNE MELG 

VELTFSCITSQE NEEA5 

IVSQYQKLDEN •- 



-HQIQSLMNILQYLLKLPEEKEETIPKAVSFNKSESQEEMLQVFNVETHTSKQLRHFKFL 

TVAIQTCIRLLDYLAKLPATKSSLSGGSGSSVLSTEQ QL FDVRTRT FKQLRH YKYL 

QTDYNGTKKVLGLISERAKDTS S5 KMKHKRKI 

HVYVALRSVI HLLFNEL FCTVLG KLLHDERA 

DSETSLSDHTTEIKEILFKVLGN VLQILPVDEFV 

LTGNVLLLAGELIRS HNMRS 



H SVSFMSQLLSSNNFIiKKWESGGPEILKGLEERLIiETVLGYISAVAQSMERNADKLTVK- 

D IMDFX.SGISSCNEWEECKMKRPDPNELLPYYQEFIIiKT-LAYVGVLNGALBAASETPSI.EK 

A S N — QK GRN ■ S 

S LLR RK ALS IVQ 

Y NAVLPLLSTSTNEDIR YHLT LVIGS 

C T 1 



H FWRALLSKAYDLLDKVNALLPTETFIPVIRGLVGNPLPSVRRKALDLLNNKLQQNISWKK 

D FWRVIANHAHDVLDNAIGLIAPQHFISVITELLKHDHVYVRIKVMDLLVTKLSPSSDYFQ 

A -WLNLDEVAVDSFGKMCEEIV — HLINATDDESGVPVKRAAISTLEVLAGRFP SGH 

S — QR VQQ G S KVS ALT AL I P DVT YNISNYSDE ETTQLAMDCLAVMAKRFS 

Y KFELEGSEAI PI VNNVMKVLL — DRMPLE S KS — WIS QVI LNTMT AL VS KYG 

C H-HATSLLKTCLATVQ ECIARFSKP QYDSAASPGSSVAGGRGN 
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Figure 3 (following) 

H -TIVTRFIiKLVPDLLAlVQ — RKKKEGEEEQAINRQTALYTLKLLCKNFGAENPDPFVPV 

D QSNAEHFGVLFAPLQEI INGILEGSSNSAQQAKLQQTALHALQLLALRHGRDYIEECRSL 

A PIFRKCLAAVAECIS SKNLGVS — SSCXRT 

5 ASPELFISPIEWS ■ GPYGLKN-SARDVQ 

Y KKLEGSILTQALTLAT EKVSSD MTEVK 

C RG-HRIRQQSLGG NKFGSD TLL 



H LSTAVKLIAPERKEEKNVLGSALLCIAEVTSTLEALAIPQLPSLMPSLLTTMKN- TS 

D IATLTKITKRPANVPKAWG3WV1TLVEICASLKAHAIAQLPKFAPQLTELLKEQVHQMA 

A TGALINVLG PKALIELPCIMKNLVKQSLEVSFASQS G RN 

S VSAIVCITV LTNTLAARILPYLADIVNYSLSILDDAR KD 

Y ISSLALITN CVQVLGVKSIAFYPKIVPP5IKLFDASLADS SN 

C ICSLTCI QR VYDQFAS FWE S TGDVI IR YCRL I ARFG D 



H ELVSSEVYLLSALA-ALQKWETLPHFISPYLEGILS QVIH1EKITSEMGSASQAN 

D SLKQGPDYVCSTLVTALHKLFKALPLFI.GPYI,VDI IGGLARLSVQLENPQLLQDKRTQVL 

A ATAEEQLLMLSVLV-TLEAVIDKLGGFLNPHLGDIMK IMVLHPEYVS DFDKNLK 

S PEGDLLELACFS-MMIDFFKVLPEFSSSYVEPTIK C ALAS DRA FEHDAI 

Y — PLKEQLQVAILL-LFAGLIKRIPSFLMSNILDVLH VI YFSREVDS SIR 

C PSELLALNQPS — SSTTAAFQGGSQTSGFGSKTG IHHRLSLIRRSLLS 



IR— LTSLKKTLATTLAPRVLLPAIKKTYK— QIEKNWKNHMGPFMS— ILQEHIGAMKKEEL 
KQKLADVWSAVAQGVE VRI LVPS CAKAFS SLLEQQAYDELGHLMQQLLL QSVRHNSAAQL 
SK-ANAIRRLLTDKIPVRLTLQPLLRIYNEAVSSGNASLVIAFNM — LEDLWKMDRSSI 

GE LLFETIANFIPTRLLMKSIFAAWPECARLGSTAALRLLEL — IELALQNSSRSAI 

LS VISLIIENIDLKEVLKVLFRIWSTEIATSNDTVAVSLFLSTLESTVENIDKKSA 

IE LRVLPAHIVKTVGELKTEKKALS ALFNLLTGYI ETQH — Q-QKPE ILRKS VI 



T S HQSQLTAFFLEALDFRAQHSEN — DLEEVGKTENCI I DCLVAMWKL SE VT FRPLFFK 
QPVQDPLS EL FLQALNFR1QVRGLGLQRQLVS DVEAS I TE X FVTWILK1 SETS FRPMYSR 

VS 5 H GK I FDQCLVAL D I RRLN PAA IQNIDDAERSVTSAMVALTKKLTESEFRPLFIR 

GTVYKS I FKFFIiDS FDSRRSLLFA EDVDNVETQAVNVFLKFVMKLSDTTFRPLFLH 

TSQ.SPIF FKLLIiS LFEFRSI5SFD N-NTXSRI EAS VHE I SNS YVLKMNDKVFR PL FVI 

QLRRT FVS DVI T E TL I VRS QERQS D-Q FENVEKL EH T VFNFVI SIASILS EVE FRTWNE 



LFDWAKTEDAP K DRLLTFYNLADCIAEKLKGLFT LFAGHLVKPFADTL 

VHKWALESTSR E TRLTYFL— LTNRIAEALKSLFV LFASDFVEDSSRLL 

S I DWAES DWDGS G SENKS I DRAI S FYGLVDRLCE S HRS I FVP YFKYVL DG I VAHLTTAE 

LHSWALEDLYETD — PSGIVSRQTFFYNFLTIFLDTLKSIVT N-YYAYVLDDT 

LVRWAFDGEGVTN-AGITETERLLAFFKFFNKLQENLRGIIT S YFTYLLE PVDMLL 

LVAWAEPGLEAKA DLAARLRLVSLLHFANDlYTSFNSIJVLP YFGRI LEI S ALVL 



DQVNISKTDEAFFDSENDPE KCCLLLQFILNCXYKIFLFDT QHFISKERAGALMMP 

TEHNS IRPE FEVEEREDD VDIXMAILNTLHHVFXYC5 — EDFINDHRFNVLMPP 

AS VS TRKKKKAKI QQTS DSI QPKSWHLRALVLSCLKN CFTiHDTGSLKFL DTNNFQVLLKP 

IELLS5K-0 TNS E VR— HL VNS S LVS AFENDT -EEFWMVPARFGKI S P V 

KRFISKD MEN VNLRR1VINSLTSSLKFDR-DEYWKSTSRFELISVS 

KKCNATLLLGTDELLLSGKRGSIEALETDLALTIAIDVISNAARHRDFFTVDRCQLVSDV 
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Figure 3 (following) 

H LVDQLENR1 G- GEEK FQERVTKH L I P CIAQFSVAMADDSLWKPLNYQILLKTRDS 

D LVNQLENDLVLGNESLQQVLSN — CIAQFAVATN-DVMWKQLNSQVLLKTRT5 

A IVSQLWEPPSSLKEHPHVP5VDEVDDLLVSCIGQMAVASGSDLLWKPLNHEVLMQTRSE 

S L I EQ I Q YAPLL DDKVL VKAI VE L -AS VAS S-SDN FRSMNTQLLQ YLRS S 

Y LVNQLSN I ENS I GKYLVKAI GA LASNNS GVDEHNQILNKL I VEHMKAS 

C I VNEL VNTKVE GHEKRC S DHLVP AIYRIGNADPDSFPELLNKIMLKTRDS 



sp -kvrfaal i tvlalaeklkenyivll pe 
np-evri: 

S V- RS RML S IE S VKQML DNLKEE YL VL LAEfcr I 



ILAFNSCVAI ARKLGE S YAALL FE T V P F I AELLE DEHQRVEKNTRT 



SI PFLAELMEDECEEVEHQCQK-TIQQLET 
GVQELET 

pflaelledvelsvkslaqd— iikqmee 
ni -narl1ai q i qtqlygrlgenwi stl pds vpfiae1meddddqvetatae-lvri idd 
cssneklwairamkliyskigeswlvllpclvpviaelledddeeierevrtglvkvven 
ra-ki r yral i vlelli ke i gdgvqphl s i llp flnel i e denkqveaqcqk-vinslqh 



VLGE- 
ILGE — S 

MSGE — SLAEYL 

RLGENESfLQ-DYLT— 
VLGE— 
KFGE — T 



-ELQSYF- 
-SVQKYL- 



— PFDRYLD — 
-TFWSGG5SA 



HEAT REPEAT 
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Ficjnxe 4 

BAP 23 MT5LAQQLQRLALPQSDASL11SRDEVASLLFDPKEAATIDRDTAFAIGCTGLEELLGIDP 

BAP28 SFEQFEAPLFSQLAKTLERSVQTKAVNKQLDENISLFLIHLSPYFLLKPAQKCLEWLIHR 

BAP 2 8 FHIHLYNQD5LIACVLPYHETRIFVRVIQLLKINN&KHRWFWLLPVKQSGVPLAKGTLIT 

BAP2 8 HCYKDLGFMDFICSLVTKSVKVFABYPGSSAQLRVLLAFYASTIVSALVAAEDVSDNIIA 

BAP28 KLFPYIQKGLKSSLPDYRAATYMI1CQISVKVTMENTFVNSLASQIIKTLTKIPSLIKDG 

BAP 2 8 LS ClilVLIjQRQKPESLGKKPFPHLCN VPDliI T ZLHGI S ET YDVS PliLRYMLPHLWS I IH 

BAP28 HVTGEETEGMDGQIYKRHLEAILTKISLKNNLDHLLASLLFEEYISYSSQEEMDSNKVSL 

BAP2 B LNEQFLPLIRLLESKYPRTLDWLSEHLKEIADLKKQELFHQFVSLSTSGGKYQFIADSD 

BAP 2 8 TSIJmSLNHPl-APVRILAMNHLKKIMKTSKEGVDESFIKIAVLARlGDDNlDVVLSAISA 

BAP28 FEIFKEHFSSEVTISNLLNLFQRAELSKNGEWYEVLKIAADILIKEElLSENDQIiSNQW 

BAP2 8 VCLLPFVVINNDDTESAEMKIAIYLSKSGICSLHPLLRGWEEALENV1KSTKPGKLIGVA 

BAP 2 8 NQKMIELLADNINLGDPSSMLKMVEDLISVGEEESFNLKQKVTFHVILSVLVSCCSSLKE 

BAP2 8 THFPFAIRVFSLLQKKIKKLESVITAVE I PSEWHIELMLDRGI PVELWAHYVEELNSTQR 

bap2 8 vavedsvflvfslkkfiyalkapksfpkgdiwwnpeqlkedsrdylhlliglfemmlnga 

BAP 2 8 davhfrvlmklfikvhledvfqlfkfcsvlwtygsslsnplncsvktvlqtqalyvgcam 

BAP28 LSSQKTQCKHQLASIS5PWTSLLINLGSPVKEVRRAAIQC1QALS-GVASPFYLIIDHI, 

Tetraodonl ■ FPSLLCCLSSPVQEVRRVSLGALQSLSRARASPFWPIMEKL 



BAP 2 8 iskaeeitsdaayviqdlatlfeelqrekklkshqklsetlknllscvyscpsyiakdlm 
Tetraodonl lrttdeliadpsylsqvrrrspasgdlrfwlltpsvcvcclg YRPSRRRPGLVLI 



bap 2 8 ecvlqgvngemvlsqllpmaeqllekiqkeptavlkdeamvlhlxlgkynefsvsllnedp 

Tetraodonl pvW-VFCQSILSALLPLLERLLEQSSPDTPNQLRDEAQUVI.LILSKYNEASAPI.LAKDE 



BAP28 ksldifikavhttkelyagmptiqitalekitkpffaaisdekvqqkllrmlfdllvnck 

Tetraodonl NCLDLFIRALRNSTQQHLDIPSCQIFALEQITKSFFSAIESETVXQKLLSVMFDLLAENX 

BAP 2 8 NSHCAQTVSSVFKGISVNAEQVBIELEPPDKAKPLGTVQQKRRQKMQ-QKKSQDLE5VQEV 

Tetraodonl XPLVAI T I GS VFKRI T VDAQLVANELAPADKAS I SMTVQQS RRSRMI L 



BAP28 GGSYWQRVTLILELLQHKXKLRSPQILVPTLFNLLSRCLEPLPQEQGNMEYTKQLILSCLL 
BAP 2 8 NICQKLSPDGGKIPKDILDEEKFNVELIVQCIRLSEMPQTHHHA1LLLGTVAGIFPDKVL 
BAP 2 8 HNIMSIFTFMGANVMRLDBTYSFQVINKTVKMVIPALIQSDSGDSIEVSRNVEEIVVKII 

BAP28 SVFVDALPHVPEHRRLPILVQLVDTLGAEKFLWIIiLILLFEQYVTKTVlAAAYGEKDAIL 
Tetraodon2 L PVL VQL VET L GPAR FLWVLMLLL FKL HATHT ANT AS E — KDAAV 



BAP28 EADTEFWFSVCCEFSVQHQIQSLMNILQYLLKLPEEKEETIPKAVSFNKSESQEE ■ 

Tetraodon2 EKDVDFWISLCSQFKVGEQLASLNHILGFLLQLPEDKDEAASKHATGRRTTQKKEKEEQG 



BAP 2 8 MLQVFNVETHTSKQLRHFKFLSVSFMSQLLSSNNFLKKWESGGP-EI1KGLEERI.L 

Tetraodon2 DKMEELIFSVEAHSSKELRHFKFISVSFMAQLLGSASFIGKVSEITTSNSLLLSLKRMLL 



BAP28 ElVLGYISAVAQSMERNADKLTVKFWRALLSECAYDiI.DKVNAI.t.PTETFIPVIRGI,VGIJP 
Tetraodon2 EDLLRYIHSIARSVEENAMKPTAKFWRVLLNKAYDVLDKVNSLLPTDTFIVVMKGLMGND 
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Fig-uzr© 4 (following) 

BAP 2 8 LPSVRRKALDLLNNKLQQNlSWKKTIVTRFLKLVPDLLAIVQRKKKEGEEEQAINRQTAXi 
Tetraodon2 LFSVRRKAMELLNNKL 

BAP 2 8 

YTLKLLCKNFGAENPDPFVPVLSTAVKLIAPERKEEKNVLGSALLCIAEVTSTLEALAIPQLPSLMPSL 
BAP 2 8 LTTMKNTSELVSSEVYLL5ALAALQKWETLPHFISPYLEGILSQVIHLEKITSEMGSAS 
BAP2B QANIRLTSLKKTLATTIAPRVLLPAIKKTYKQIEKNWKNHMGPFMSILQEHIGAMKKEEL 
BAP 2 8 TSHQSQLTAFFLEALDFRAQHSENDLEEVGKTENCIIDCLVAMWKLSEVTFRPLFFKLF 

BAP28 DWAKTEDAPKDRLLTFYNLADCIAEKLKGLFTLFAGHLVKPFADTLDQVNISKTDEAFFD 
Tetraodon3 ' evlfe 

BAP 2 8 SENDPEKCCLLLQFILNCLYKIFLFDTQHFISKERAGALMMPLVDQLENRLGGEEKFQER 
Tetraodon3 SSHADQKVALXLQYVLXCLHKIFLYDTQRFLSKERADTLMNPLLDQLEN-PAGGPQTYQQR 

BAP2 0 VTKHLIPCIAQFSVAI^DDSLWKPLNYQILLKTRDSSPPCVRFAALITVLALAEKLKENYI 
Tetraodon3 VTQHLVPCLGQFAVALADDTQWKTLNYXXXLKSRHSDAKVRFSSLLMLMX1TSKLKENYM 

BAF2 8 VLX. PE S I PFLAELME DECEEVEHQCQKT I QQL ET VLGEPLQS YF 
tetraodon3 VLL PET I PFLAELME ■ 
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SEQUENCE LISTING 



<110> Barry, Caroline 

Bougueleret,Lydie 
Chumakov, Ilya 
Cohen-Akenine , Annick 

<12 0> A NOVEL BAP28 GENE AND PROTEIN 

<130> GENSET. 063AUS 

<141> 2000-06-23 

<150> US 60/141,323 
<151> 1999-06-25 

<150> US 60/176,880 
<151> 2000-01-18 

<160> 63 

<170> Patent. pm 

<210> 1 

<211> 97662 

<212> DNA 

<213> Homo sapiens 

<220> 

<221> misc_f eature 

<222> 2996. .4996 

<223> 5 1 regulatory region BAP 2 8 

<220> 

< 2 2 1 > exon 
<222> 4997. .5076 
<223> exon 01 BAP28 

<220> 

<221> exon 
<222> 5371. .5544 
<223> exon 02 BAP28 

<220> 

<221> exon 

<222> 6121 

<22 3> exon 

<220> 

<221> exon 

<222> 9877 . . 10018 

<223> exon 04 BAP 2 8 

<220> 

<221> exon 

<222> 11522 . . 11623 

<223> exon 05 BAP2 8 

<220> 

<221> exon 



. . 6337 
03 BAP2 8 



<222> 12521. . 12661 
<223> exon 06 BAP28 



<220> 

<2 21> exon 

<222> 13453 . . 13664 

<223> exon 07 BAP28 

<220> 

<221> exon 

<222> 13824 . . 13957 

<223> exon 08 BAP 2 8 

<220> 

<221> exon 

<222> 15376 . . 15478 

<223> exon 09 BAP28 

<220> 

<221> exon 

<222> 16855. .16965 

<22 3> exon 10 BAP2 8 

<220> 

<221> exon 

<222> 17378 . . 17495 

<223> exon 11 BAP2 8 

<220> 

<221> exon 

<222> 18535 . . 18642 

<2 23> exon 12 BAP2 8 

<220> 

<221> exon 

<222> 21446. .21541 

<2 23> exon 13 BAP2 8 

<220> 

<2 21> exon 

<222> 21999. .22087 

<223> exon 14 BAP 2 8 

<220> 

<2 21> exon 

<222> 23036. .23247 

<223> exon 15 BAP 2 8 

<220> 

<221> exon 

<222> 23546. .23667 

<223> exon 16 BAP 2 8 

<220> 

<221> exon 

<222> 24270 . .24461 

<223> exon 17 BAP28 

<220> 

<221> exon 

<222> 26287 . .26470 

<223> exon 18 BAP2 8 



<220> 

<221> exon 

<222> 26611. .26747 

<223> exon 19 BAP 2 8 

<220> 

<221> exon 

<222> 28068 . .28260 

<223> exon 20 BAP 2 8 

<220> 

<221> exon 

<222> 32540. .32709 

<223> exon 21 BAP28 

<220> 

<221> exon 

<222> 33112 . .33270 

<223> exon 22 BAP 2 8 

<220> 

<221> exon 

<222> 34586. .34828 

<22 3> exon 2 3 BAP2 8 

<220> 

<221> exon 

<222> 35156 . . 35287 

<223> exon 24 BAP2 8 

<220> 

<221> exon 

<222> 36660 . . 36763 

<223> exon 25 BAP28 

<220> 

<2 21> exon 

<222> 36934 . .37077 

<223> exon 26 BAP28 

<220> 

<221> exon 

<222> 37803 . .37921 

<223> exon 27 BAP 2 8 

<220> 

<221> exon 

<222> 38017 . .38138 

<223> exon 28 BAP28 

<220> 

<221> exon 

<222> 40365. .40493 

<223> exon 2 9 BAP 2 8 

<220> 

<221> exon 

<222> 42618 . .42848 

<223> exon 30 BAP28 

<220> 
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2 2 1 > exon 

222> 43452 . .43578 

223> exon 31 BAP 2 8 

<220> 

<221> exon 

<222> 44836 . .44999 

<223> exon 32 BAP 2 8 

<220> 

<2 21> exon 

<222> 48223 . .48269 

<223> exon 33 BAP28 

<220> 

<221> exon 

<222> 49656. .49779 

<223> exon 34 BAP28 

<220> 

<221> exon 

<222> 50358 . . 50498 

<223> exon 35 BAP2 8 

<220> 

<221> exon 

<222> 50964 . . 51256 

<22 3> exon 3 6 BAP2 8 

<220> 

<221> exon 

<222> 52148 . . 52298 

<223> exon 37 BAP28 

<220> 

<221> exon 

<222> 53235 . . 53393 

<223> exon 38 BAP28 

<220> 

<2 21> exon 

<222> 53554. .53688 

<223> exon 39 BAP 2 8 

<220> 

<221> exon 

<222> 53838 . . 53942 

<223> exon 40 BAP 2 8 

<220> 

<221> exon 

<222> 54029 . . 54197 

<22 3> exon 41 BAP2 8 

<220> 

<221> exon 

<222> 54741. .54895 

<223> exon 42 BAP28 

<220> 

<221> exon 

<222> 55754 . .55912 



:223> exon 43 BAP 2 8 



<220> 

<2 21> exon 
<222> 57386 . . 57494 
<223> exon 44 BAP 2 8 

<220> 

<221> exon 
<222> 58504 . .58827 
<223> exon 45 BAP 2 8 

<220> 

<221> exon 

<222> 58504 . .59354 

<223> exon 45b BAP28 

<220> 

<221> exon 
<222> 85947 . . 86168 
<22 3> exon B' BAP2 8 

<220> 

<221> exon 

<222> 91229. . 91851 

<223> exon A' BAP28 

<220> 

<221> m±sc_feature 

<222> 91852 . .97662 

<223> 3 ' regulatory region BAP28 

<220> 

<221> misc_feature 
<222> 55071. .57071 
<223> 3 'regulatory region PCTA 

<220> 

<221> exon 

<222> 57072 

<223> exon 

<220> 

<221> exon 

<222> 57072 

<223> exon 

<220> 

<2 21> exon 
<222> 61344 
<2 23> exon 

<220> 

<221> exon 
<222> 64578 . . 64743 

<22 3> exon 8 PCTA complement 

<220> 

<221> exon 

<222> 65844 . . 65932 

<223> exon 7 PCTA complement 



. . 58406 

9ter PCTA complement 



. . 61478 

9 PCTA complement 



. . 61478 

9bis PCTA complement 
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<220> 

<221> exon 

<222> 66452 . . 66577 

<2 23> exon 6b PCTA complement 



<220> 

<221> exon 

<222> 66705 . . 66731 

<223> exon 6 PCTA complement 



<220> 

<221> exon 

<222> 67782 . .67838 

<223> exon 5 PCTA complement 



<220> 

<221> exon 

<222> 68810 . . 68929 

<22 3> exon 4 PCTA complement 



<220> 

<221> exon 

<222> 70404 . .70614 

<223> exon 3 PCTA complement 



<220> 

<221> exon 

<222> 71931. .72019 

<223> exon 2 PCTA complement 



<220> 

<221> exon 

<222> 83433 . . 83580 

<223> exon 1 PCTA complement 



<220> 

<221> exon 

<222> 85486 . . 85577 

<22 3> exon 0 PCTA complement 



<220> 

<221> exon 

<222> 85923 . . 86108 

<223> exon B PCTA complement 



<220> 

<221> exon 

<222> 91043 . . 91119 

<22 3> exon D PCTA complement 



<220> 

<221> exon 

<222> 91259 . . 91325 

<22 3> exon A PCTA complement 



<220> 

<221> exon 

<222> 92449 . . 92662 

<223> exon C PCTA complement 



<220> 

<221> misc feature 



<222> 92663 . . 94662 

<223> 5 'regulatory region PCTA 

<220> 

<221> allele 
<222> 4972 

<223> 5-381-133 : polymorphic base A or G 
<220> 

<221> allele 
<222> 5468 

<22 3> 5-3 82-162 : polymorphic base C or T 
<220> 

<221> allele 
<222> 5616 

<223> 5-382-310 : polymorphic base C or T 
<220> 

<221> allele 
<222> 5622 

<223> 5-382-316 : polymorphic base G or C 
<220> 

<221> allele 
<222> 13158 

<223> 99-7190-213 : polymorphic base C or T 
<220> 

<221> allele 
<222> 23761 

<223> 99-7203-282 : polymorphic base A or T 
<220> 

<221> allele 
<222> 23765 

<223> 99-7203-286 : polymorphic base C or T 
<220> 

<221> allele 
<222> 27928 

<223> 5-383-42 : polymorphic base A or G 
<220> 

<221> allele 
<222> 28070 

<223> 5-383-184 : polymorphic base G or T 
<220> 

<221> allele 
<222> 30061 

<223> 99-7205-228 : polymorphic base A or G 
<220> 

<221> allele 
<222> 32750 

<223> 5-384-312 : polymorphic base G or C 
<220> 

<221> allele 
<222> 48189 

<223> 5-379-80 : polymorphic base A or C 
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<220> 

<221> allele 
<222> 49615 
<223> 5-380-58 

<220> 

<221> allele 
<222> 49616 
<223> 5-380-59 

<220> 

<221> allele 
<222> 50304 
<223> 5-366-143 

<220> 

<221> allele 

<222> 51133 

<223> 5-370-197 

<220> 

<221> allele 

<222> 51183 

<223> 5-370-247 

<220> 

<221> allele 
<222> 53534 
<223> 5-373-98 

<220> 

<221> allele 
<222> 53600 
<223> 5-373-164 

<220> 

<221> allele 
<222> 53658 
<223> 5-373-222 

<220> 

<221> allele 
<222> 54173 
<223> 5-375-200 

<220> 

<221> allele 
<222> 54232 
<223> 5-375-259 

<220> 

<221> allele 

<222> 54269 

<223> 5-375-296 

<220> 

<221> allele 

<222> 54372 

<223> 5-375-399 

<220> 



polymorphic base G or T 

polymorphic base C or T 

: polymorphic base A or G 

: polymorphic base A or G 

: polymorphic base C or T 

polymorphic base C or T 

: polymorphic base C or T 

: polymorphic base A or G 

: polymorphic base A or G 

: polymorphic base C or T 

: polymorphic base G or C 

: polymorphic base G or C 
8 



<221> allele 
<222> 54867 

<223> 5-376-266 : polymorphic base A or G 
<220> 

<221> allele 
<222> 55689 

<223> 5-377-82 : polymorphic base C or T 
<220> 

<221> allele 
<222> 55834 

<223> 5-377-227 : polymorphic base A or G 
<220> 

<221> allele 
<222> 59937 

<223> 5-14-165 : polymorphic base A or G 
<220> 

<221> allele 
<222> 60980 

<223> 5-11-158 : polymorphic base C or T 
<220> 

<221> allele 
<222> 66492 

<223> 5-202-117 : polymorphic base A or T 
<220> 

<221> allele 
<222> 66514 

<223> 5-202-95 : polymorphic base A or C 



<220> 

<221> allele 
<222> 71834 

<223> 99-1605-112 : polymorphic base A or G 



<220> 

<221> allele 
<222> 71993 

<223> 5-2-178 : polymorphic base A or G 
<220> 

<221> allele 
<222> 85702 

<223> 5-171-204 : polymorphic base A or G 
<220> 

<221> allele 
<222> 86504 

<223> 5-169-97 : polymorphic base G or C 



<220> 

<221> allele 
<222> 87135 

<223> 99-1572-440 : polymorphic base A or G 



<220> 

<221> allele 
<222> 91093 



<223> 5-403-325 



polymorphic base C or T 



<220> 

<221> allele 
<222> 91124 

<223> 5-403-294 : polymorphic base A or G 
<220> 

<221> allele 
<222> 91209 

<223> 5-403-209 : polymorphic base C or T 
<220> 

<221> allele 
<222> 91262 

<223> 5-403-156 : polymorphic base C or T 
<220> 

<221> primer_bind 
<222> 4840 . .4859 
<223> 5-381. pu 

<220> 

<221> primer_bind 

<222> 5249. .5266 

<223> 5-381. rp complement 

<220> 

<2 21> primer_bind 

<222> 5307 . .5324 

<223> 5-382. pu 

<220> 

<221> primer_bind 

<222> 5710. .5729 

<2 23> 5-382.rp complement 

<220> 

<221> primer_bind 
<222> 12946. .12963 
<223> 99-7190. pu 

<220> 

<221> primer_bind 
<222> 13471 . . 13488 
<223> 99-7190. rp complement 

<220> 

<221> primer_bind 
<222> 23482 . .23501 
<223> 99-7203. pu 

<220> 

<221> primer_bind 
<222> 23909. .23929 
<223> 99-7203. rp complement 

<220> 

<221> primer_bind 
<222> 27887. .27904 
<223> 5-383. pu 
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<220> 

<2 21> prime r_bind 
<222> 28296. .28315 
<223> 5-383. rp complement 

<220> 

<221> primer_bind 
<222> 29833 . .29853 
<223> 99-7205. rp 

<220> 

<221> primer_bind 
<222> 30270 . .30288 
<223> 99-7205. pu complement 

<220> 

<221> primer_bind 
<222> 32439. .32457 
<223> 5-384. pu 

<220> 

<2 21> primer_bind 
<222> 32858 . .32877 
<223> 5-384. rp complement 

<220> 

<221> primer_bind 
<222> 48110 . .48127 
<223> 5-379. pu 

<220> 

<221> primer_bind 

<222> 48441 . .48460 

<22 3> 5-3 79.rp complement 

<220> 

<221> primer_bind 
<222> 49558 . .49577 
<223> 5-380. pu 

<220> 

<2 21> primer_bind 

<222> 49958 . . 49977 

<223> 5-380. rp complement 

<220> 

<221> primer_bind 

<222> 50162 . . 50180 

<223> 5-366. pu 

<220> 

<221> primer_bind 

<222> 50564. .50583 

<223> 5-366.rp complement 

<220> 

<221> primer_bind 
<222> 50937 . . 50955 
<223> 5-370. pu 

<220> 

<221> primer_bind 



<222> 51341. .51359 

<223> 5-370. rp complement 

<220> 

<221> primer_bind 
<222> 53437 . . 53455 
<223> 5-373. pu 

<220> 

<221> primer_bind 
<222> 53840 . . 53858 
<223> 5-373. rp complement 

<220> 

<221> primer_bind 
<222> 53974. .53993 
<223> 5-375. pu 

<220> 

<221> primer_bind 
<222> 54375 . .54394 
<223> 5-375. rp complement 

<220> 

<221> primer_bind 
<222> 54602 . . 54619 
<223> 5-376. pu 

<220> 

<221> primer_bind 
<222> 55002 . . 55021 
<223> 5-3 76.rp complement 

<220> 

<221> primer_bind 
<222> 55608 . .55625 
<223> 5-377. pu 

<220> 

<221> primer_bind 
<222> 56025 . . 56043 
<223> 5-377. rp complement 

<220> 

<221> primer_bind 

<222> 59673 . . 59692 

<223> 5-14.rp 

<220> 

<221> primer_bind 
<222> 60083 . .60100 
<223> 5-14.pu complement 

<220> 

<221> primer_bind 
<222> 60718 . . 60737 
<223> 5-11. rp 

<220> 

<221> primer_bind 
<222> 61119. .61137 
<223> 5-11. pu complement 
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<220> 

<221> primer_bind 
<222> 66177 . . 66194 
<223> 5-202. rp 

<220> 

<221> primer_bind 
<222> 66589 . . 66608 
<223> 5-202. pu complement 

<220> 

<2 21> prime r_bind 
<222> 71723 . . 71743 
<223> 99-1605. pu 

<220> 

<221> primer_bind 
<222> 71735 . .71754 
<223> 5-2. rp 

<220> 

<221> primer_blnd 

<222> 72150. .72169 

<223> 5-2. pu complement 

<220> 

<221> primer_bind 

<222> 72150 . . 72170 

<223> 99-1605. rp complement 

<220> 

<221> primer_bind 
<222> 85485 . . 85502 
<223> 5-171. rp 

<220> 

<221> primer_bind 
<222> 85887. .85905 
<223> 5-171. pu complement 

<220> 

<221> primer_bind 
<222> 86184 . . 86203 
<223> 5-169. rp 

<220> 

<221> primer_bind 
<222> 86581 . . 86600 
<223> 5-169. pu complement 

<220> 

<221> primer_bind 
<222> 86932 . . 86952 
<223> 99-1572. rp 

<220> 

<221> primer_bind 

<222> 87556 . . 87574 

<223> 99-1572. pu complement 

<220> 



<221> primer_bind 
<222> 91068 . . 91085 
<223> 5-403. rp 



<220> 

<2 21> primer_bind 
<222> 91398. .91417 
<223> 5-403. pu complement 



<220> 

<221> primer_bind 
<222> 4953 . .4971 
<223> 5-381-133 .mis 



<220> 

<221> primer_bind 

<222> 4973 . .4991 

<223> 5-381-133 .mis complement 



<220> 

<221> primer_bind 
<222> 5449. .5467 
<223> 5-382-162 .mis 



<220> 

<221> primer_bind 

<222> 5469. .5487 

<223> 5-382-162 .mis complement 



<220> 

<221> primer_bind 
<222> 5597 . .5615 
<223> 5-382-310 .mis 



<220> 

<221> primer_bind 
<222> 5603 . .5621 
<223> 5-382-316 .mis 



<220> 

<221> primer_bind 
<222> 5617. .5635 

<223> 5-382-310 .mis complement 
<220> 

<2 21> primer_bind 
<222> 5623 . .5641 

<223> 5-382-316 .mis complement 



<220> 

<221> primer_bind 
<222> 13139 . . 13157 
<223> 99-7190-213 .mis 



<220> 

<221> primer_bind 
<222> 13159 . . 13177 
<223> 99-7190-213 .mis complement 



<220> 

<221> primer_bind 
<222> 23742 . .23760 
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<223> 99-7203-282 .mis 



<220> 

<221> primer_bind 
<222> 23746. .23764 
<223> 99-7203-286. mis 

<220> 

<221> primer_bind 
<222> 23762 . .23780 
<223> 99-7203-282 .mis complement 

<220> 

<221> primer_bind 
<222> 23766. .23784 
<223> 99-7203 -286 .mis complement 

<220> 

<2 21> primer_bind 
<222> 27909. .27927 
<223> 5-383-42. mis 

<220> 

<221> primer_bind 
<222> 27929 . .27947 
<223> 5-383-42 .mis complement 

<220> 

<221> primer_bind 
<222> 28051. .28069 
<223> 5-383-184 .mis 

<220> 

<221> primer_bind 
<222> 28071. .28089 
<223> 5-383-184 .mis complement 

<220> 

<221> primer_bind 

<222> 30042 . .30060 

<223> 99-7205-228 .mis 

<220> 

<221> primer_bind 

<222> 30062 . .30080 

<223> 99-7205-228 .mis complement 

<220> 

<221> primer_bind 

<222> 32731. .32749 

<223> 5-384-312 .mis 

<220> 

<221> primer_bind 

<222> 32751. .32769 

<223> 5-384-312 .mis complement 

<220> 

<221> primer_bind 
<222> 48170 . .48188 
<223> 5-379-80. mis 



<220> 

<221> primer_bind 
<222> 48190 . .48208 
<223> 5-379-80. mis complement 

<220> 

<221> primer_bind 
<222> 49596 . .49614 
<223> 5-380-58. mis 

<220> 

<221> prime r_bind 

<222> 49597 . .49615 

<223> 5-380-59. mis 

<220> 

<221> primer_bind 

<222> 49616 . .49634 

<223> 5-380-58. mis complement 

<220> 

<221> primer_bind 
<222> 49617 . .49635 
<223> 5-380-59. mis complement 

<220> 

<2 21> primer_bind 
<222> 50285 . .50303 
<223> 5-366-143 .mis 

<220> 

<221> primer_bind 

<222> 50305 . . 50323 

<223> 5-366-143 .mis complement 

<220> 

<221> primer_bind 
<222> 51114 . .51132 
<223> 5-370-197 .mis 

<220> 

<221> primer_bind 
<222> 51134 . .51152 
<223> 5-370-197 .mis complement 

<220> 

<221> primer_bind 
<222> 51164 . .51182 
<223> 5-370-247 -mis 

<220> 

<2 21> primer_bind 
<222> 51184 . . 51202 
<223> 5-370-247 .mis complement 

<220> 

<221> primer_bind 

<222> 53515. .53533 

<223> 5-373-98. mis 

<220> 

<221> primer_bind 



<222> 53535. .53553 

<223> 5-373-98 .mis complement 



<220> 

<221> primer_bind 
<222> 53581. .53599 
<223> 5-373-164 .mis 

<220> 

<2 21> primer_bind 

<222> 53601. .53619 

<223> 5-373 -164 .mis complement 

<220> 

<221> primer_bind 
<222> 53639 . . 53657 
<223> 5-373-222 .mis 

<220> 

<221> primer_bind 
<222> 53659. .53677 
<223> 5-373-222 .mis complement 

<220> 

<221> primer_bind 
<222> 54154 . . 54172 
<223> 5 -375-200 .mis 

<220> 

<2 21> primer_bind 
<222> 54174 . . 54192 
<223> 5-375-200 .mis complement 

<220> 

<2 21> primer_bind 

<222> 54213 . . 54231 

<223> 5-375-259 .mis 

<220> 

<2 21> primer_bind 

<222> 54233 . . 54251 

<223> 5-375-259 .mis complement 

<220> 

<221> primer_bind 
<222> 54250. .54268 
<223> 5-375-296 .mis 

<220> 

<221> primer_bind 
<222> 54270 . . 54288 
<223> 5-375-296 .mis complement 

<220> 

<221> primer_bind 
<222> 54353 . . 54371 
<223> 5 -375-399 .mis 

<220> 

<221> primer_bind 

<222> 54373 . . 54391 

<223> 5-375-399 .mis complement 



<220> 

<221> primer_bind 
<222> 54848 . . 54866 
<223> 5-376-266 .mis 

<220> 

<2 21> primer_bind 
<222> 54868 . . 54886 
<223> 5-376-266 .mis complement 

<220> 

<221> primer_bind 
<222> 55670 . . 55688 
<223> 5-377-82. mis 

<220> 

<221> primer_bind 

<222> 55690 . . 55708 

<223> 5-377-82. mis complement 

<220> 

<221> primer_bind 

<222> 55815. .55833 

<223> 5-377-227 .mis 

<220> 

<221> primer_bind 
<222> 55835. .55853 
<223> 5-377-227 .mis complement 

<220> 

<221> primer_bind 
<222> 59918 . . 59936 
<223> 5-14-165. mis 

<220> 

<221> primer_bind 
<222> 59938. .59956 
<2 23> 5 -14 -165. mis complement 

<220> 

<221> primer_bind 
<222> 60961. .60979 
<223> 5-11-158. mis 

<220> 

<221> primer_bind 

<222> 60981. .60999 

<223> 5-11-158 .mis complement 

<220> 

<221> primer_bind 
<222> 66473 . .66491 
<223> 5-202-117 .mis 

<220> 

<221> primer_bind 
<222> 66493 . . 66511 
<223> 5-202-117 .mis complement 

<220> 
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221> primer_bind 
222> 66495 . . 66513 
223> 5-202-95. mis 

220> 

22 1> primer_bind 
222> 66515 . . 66533 
223> 5-2 02-95.mis complement 

220> 

22 1> primer_bind 
222> 71815 . .71833 
223> 99-1605-112 .mis 

<220> 

<221> primer_bind 
<222> 71835 . .71853 
<223> 99-1605-112 .mis complement 

<220> 

<221> primer_bind 
<222> 71974 . .71992 
<223> 5-2-178. mis 

<220> 

<221> primer_bind 
<222> 71994 . .72012 
<223> 5-2-178. mis complement 

<220> 

<221> primer_bind 
<222> 85683 . . 85701 
<223> 5-171-204 .mis 

<220> 

<221> primer_bind 
<222> 85703 . . 85721 
<223> 5-171-204 .mis complement 

<220> 

<221> primer_bind 
<222> 86485 . . 86503 
<223> 5-169-97. mis 

<220> 

<221> primer_bind 
<222> 86505 . . 86523 
<223> 5-169-97. mis complement 

<220> 

<2 21> primer_bind 
<222> 87116 . . 87134 
<223> 99-1572-440 .mis 

<220> 

<221> primer_bind 

<222> 87136 . . 87154 

<223> 99-1572 -440 .mis complement 

<220> 

<221> primer_bind 
<222> 91074 . .91092 



<223> 5-403-325 .mis 



<220> 

<221> primer_bind 
<222> 91094. .91112 
<223> 5-403-325 .mis complement 



<220> 

<221> primer_bind 
<222> 91105 . . 91123 
<223> 5-403-294 .mis 



<220> 

<221> primer_bind 
<222> 91125 . . 91143 
<223> 5-403 -294 .mis complement 



<220> 

<221> primer_bind 
<222> 91190. .91208 
<223> 5-403-209 .mis 



<220> 

<221> primer_Jbind 

<222> 91210 . . 91228 

<223> 5-403 -209 .mis complement 



<220> 

<221> primer_bind 
<222> 91243 . . 91261 
<223> 5-403-156 .mis 



<220> 

<2 21> primer_bind 

<222> 91263 . . 91281 

<223> 5-403-156 .mis complement 



<220> 

<221> misc_binding 
<222> 4960 . .4984 
<223> 5-381-133 .probe 



<220> 

<221> misc_binding 
<222> 5456 . . 5480 
<223> 5-382-162 .probe 



<220> 

<2 21> misc_binding 
<222> 5604 . .5628 
<223> 5-382-310 .probe 



<220> 

<221> misc_binding 
<222> 5610 . .5634 
<223> 5-382-316 .probe 



<220> 

<221> misc_binding 
<222> 13146. .13170 
<223> 99-7190-213 .probe 



<220> 

<221> misc_binding 
<222> 23749. .23773 
<223> 99-7203-282 .probe 

<220> 

<221> misc_binding 
<222> 23753 . .23777 
<223> 99-7203-286 .probe 

<220> 

<221> misc_binding 

<222> 27916. .27940 

<223> 5-383-42 .probe 

<220> 

<221> misc_binding 
<222> 28058 . .28082 
<223> 5-383-184 .probe 

<220> 

<221> misc_binding 
<222> 30049 . .30073 
<223> 99-7205-228 .probe 

<220> 

<221> misc_binding 
<222> 32738. .32762 
<223> 5-384-312 .probe 

<220> 

<2 21> misc_binding 
<222> 48177 . .48201 
<223> 5-379-80 .probe 

<220> 

<221> misc_binding 
<222> 49603 . .49627 
<223> 5-380-58 -probe 

<220> 

<221> misc__binding 
<222> 49604 . .49628 
<223> 5-380-59. probe 

<220> 

<221> misc_binding 
<222> 50292 . . 50316 
<223> 5-366-143 .probe 

<220> 

<221> misc_binding 
<222> 51121. .51145 
<223> 5-370-197 .probe 

<220> 

<221> misc_binding 

<222> 51171. .51195 

<223> 5-370-247 .probe 

<220> 

<221> misc_binding 



<222> 53522 . .53546 
<223> 5-373-98 .probe 

<220> 

<221> misc_binding 
<222> 53588. .53612 
<223> 5-373-164 -probe 

<220> 

<221> misc_binding 

<222> 53646 . . 53670 

<223> 5-373-222 .probe 

<220> 

<2 21> misc_binding 
<222> 54161. .54185 
<223> 5-375-200 .probe 

<220> 

<221> misc_binding 
<222> 54220 . .54244 
<223> 5-375-259 -probe 

<220> 

<221> misc_binding 
<222> 54257. .54281 
<223> 5-375-296 -probe 

<220> 

<221> misc_binding 
<222> 54360 . . 54384 
<223> 5-375-399 -probe 

<220> 

<221> misc_binding 

<222> 54855. .54879 

<223> 5-376-266 .probe 

<220> 

<221> misc_binding 
<222> 55677 . .55701 
<223> 5-377-82 .probe 

<220> 

<221> misc_binding 
<222> 55822 . . 55846 
<223> 5-377-227 -probe 

<220> 

<221> misc_binding 
<222> 59925. .59949 
<223> 5-14-165 .probe 

<220> 

<221> misc_binding 
<222> 60968 . . 60992 
<223> 5-11-158 -probe 

<220> 

<221> misc_binding 
<222> 66480 . .66504 
<223> 5-202-117 .probe 



<220> 

<2 21> misc_bin.ding 
<222> 66502 . .66526 
<223> 5-202-95 .probe 



<220> 

<221> misc_binding 
<222> 71822 . . 71846 
<223> 99-1605-112 .probe 



<220> 

<2 21> misc_binding 
<222> 71981. .72005 
<223> 5-2-178 .probe 



<220> 

<221> misc_binding 
<222> 85690 . . 85714 
<223> 5-171-204 .probe 



<220> 

<221> misc_binding 
<222> 86492 . . 86516 
<223> 5-169-97 .probe 



<220> 

<2 21> misc_binding 
<222> 87123 . . 87147 
<223> 99-1572-440 .probe 



<220> 

<2 21> misc_binding 
<222> 91081 . . 91105 
<223> 5-403-325 .probe 



<220> 

<221> misc_binding 
<222> 91112 . . 91136 
<223> 5-403-294 .probe 



<220> 

<221> misc_binding 
<222> 91197 . . 91221 
<223> 5-403-209. probe 



<220> 

<221> misc_binding 
<222> 91250 . . 91274 
<223> 5-403-156 .probe 



<400> 



ccttcgaagg cattattttt atggcatttt tatgacacat ggaagctttc atgaaccaat 
ttttagataa ttgtatataa ttttccattt taaaaagtgt gaaaactgat acttccataa 
ggcaactggg gataccctga atgccctctg gggtcaggaa aatgctttgg tgccacctgc 
cggtttccaa agatgtttca ggaacttgct cctgttgatt tccaaatctt tttttttttt 
tttaattcct agctccctcc cagtacattt caaaatacca aaaaaaaaaa aaaaaaaaaa 
attataaatt ttttggtagc aagagcacaa gtgctcaagc ttataaaaat gcaaataaat 
ttgtttggga tgcaatatga tgaaacacat acttctcaat catttaacta gtcaattttt 
tttagcatat tgccaaaatg tagatttcat atgttgactt tacattgcta attacacaca 
tcctatttct tttctcgtta tttttctttc tttctttatt tttacttttt gcgactccct 
ctggtaccca ggctggagtt cagtggtgca atttcagctc acttcaacct ctgcctccca 



120 
180 
240 
300 
360 
420 
480 
540 
600 
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ggctcaattg atcctcccag gctcaactga tcctcccatt ttcagcctcc cagggcgtgc 660 

taccatgccc ggctaatttt tgtgctcatt gcagaggtgg agtttcccca tgttgctcag 720 

actggtcttg aattcctggg ctcaagcgat atgccagcct tggcctctca accttgctgg 780 

gtttacaagc gtgagccact gagcccagca acagatatat tttcaagtgg atggtatccc 840 

atcagttgtg atatatgatg taaacactct actaataatt aaactttgaa gtttgtgaaa 900 

attttacttt tattatagtt agaataattc taagttattc ctataataat gctacactta 960 

ttcacttgaa ttctgataca catttcttga acaggaagga gatacagata cagcttatgc 1020 

acatttatat tcattcattc attcgataaa taaatatgta ttgttaacca cgttccaggt 1080 

actttatttg gtagttagga tttagtagta aacacgagaa agtcttcaac atcatcctta 114 0 

gatcgtgatc tctgcatatc acaaatcata caaaataaat ttgcttaaaa atgtgggaac 1200 

ctgcctttca aaacctgcca tttagcacta ctgtggcata acctataaac ctaaacatag 1260 

accctcatga tttatgcatt taagtttgtg ggaaataggt ctcttgtccc ttgtcctgaa 1320 

agtaaaagac aaccctgtct gaatacactg aatatccgtg gattgtactg tttccggacg 13 8 0 

ctgcctaaga gcatagggag aatttgtttt tttgtttgtt ttttgtttcg ttttattttt 1440 

gagacggagt ttcgctcttg ttgcccaggc tggagtgcaa acggcgcgat gtcggctcac 1500 

tgcaacctcc acctcccggt tcaagagatt ctcctgcctc agcctcccta gtagctggga 1560 

ttacaggcgt gcgccaccac gcccggccaa ttttttttta gtagagtcag gattagtatt 162 0 

attagtagag atggggtttc accatgttgg ccaggctggt ctcaaactcc tgacctcagg 168 0 

tgagccgccc gcctcagcct ccaaagtgct gggttacagg catgagccac cgcacctggc 174 0 

cgggacccga ccaggatgct gaatacagaa atgcttaggt aagagaaaag aaaagttaat 1800 

ttgtcacact tttcctttca aactacatga acatattttt gcattataaa gtattatatc 1860 

taagtagttc caaacatgga atttcttatt tccttttttt ccccccaatt tatggttctg 192 0 

gatatactca ggaattagtg tagaattctc aacaatcaga tatggttgct gaggaacatt 1980 

taacaatatt aaacaattca catgactctg aaatttgaaa ataggtagat acagacataa 2040 

catgaacaaa gggtgatacc aattctttac actggcaact aggtggacat tgaatgatac 2100 

gcttgtgagt aatttacttt aatgaacaat ttcattaagt aatatttacc aaaaaaacaa 2160 

atacaacttt agatttattt aaattatttt acttaaaatt ttgtcactaa ttaaaccccg 2220 

tctctactaa aaatacaaaa attagctggg tgtggtggca ggcacctgta atcccagcta 2280 

cttgagaagc tgaggcagag gattgcttga acctgggagg cggaggttgc agtgagctga 2 34 0 

gatcacgcca ccacactcga gcctgggcga cagagcaaga ctccatctca aaacaaaaaa 24 00 

aaaatattgt cactaattat actttacatc ttataagaaa ggtaaatctt ttgaaaaaag 2460 

tgaaaaagat ttaatgtatt gctttttaat ttaatttata tttttattga aacattcaaa 2520 

ctatatgttt tgaatataat taaattttat ttttaatcct ttttgatcat tatttctgat 2580 

agaacacaat tacatgaaaa tcttgatcaa acagcataca tggtaatttt gctgaaatga 2640 

aggtaaattt tcatgggcta aatatatagg aaatgtatta actatagatg tctttatcac 2700 

tcatccaaaa taatcagcca atcaatagga cacccggaca ggaatgatat aattaaatgc 2760 

aatcagattt tgctgatttt catctatgta aaaacatttt tattttgcca ttataaatgt 2820 

ttactcacca atattgagag ttatagcata tcctagttaa taatgtgtta agttaattta 2880 

taacttttaa atatttacac ctacagcagt gagtccatct gtactctttc tcaggctcca 2940 

taagtcttag ggatgggctt tatgccaacg tgctgaagcc aatattatag tgagggaata 3000 

caagaaataa acaggtaaac aaacagacaa atcaggtcat ttcaagtagt gataatggct 3060 

atgaagaaaa taccagcttg gtacatctgt ccgtcagata aaaatatata attcaagatt 3120 

attacatttt ttttaaaacc aaagcttttt ttaaaaaaaa aattacattt atgaacatct 3180 

gacttgtttt cctttttact ttccaaagta aaattcggca tggcactata caccatcaca 3240 

gctgacatag gaaggactga gtcaaatctt tgtagcactt tttcaagttt cacttaaata 3300 

aagcttttaa aaaatatata gggtattttt taagcaaaaa aagcaaatta tcttatcaat 3360 

gaaacagacc tggtgttcat ttcttttaaa gtaccgaaag ctgattgctt ctgtaaaggt 3420 

aaaactcctg tgacatgtta gaaagaaaaa aaaaattcct ttgagagata tgtttgtaag 3480 

aatgaaatag gtactactag aattttcatg ttattctctg caaggcactc aacaccacat 3540 

gaaaagaaga ttattaacag tcagtagaaa tactaataac tgaagaaaat atttggttgt 3600 

tttaaatgct tttaaagcaa accaacaaca aaagattctg tttgtaaatg ggagagaatc 3660 

tgcatgaggt atagacaacc agggcctcca aatttgtagc tgtgtttctg acattctcca 3720 

gggaagacgg ttacagaaag acttgacccc ctggccccgc agagctcttc agagaaatta 3 7 80 

atgcatccag aaaagacaga gcatcagatc tcactccttc gtctggaaga cgtcagttca 3840 

tcctagttct agcgcatacc ggtgttttgg aaacagatta gctatattca tacataagga 3 900 

tactcttccg caacactatc tgtagtgagg ccaagaccag tggttgcggg aatcttcgca 3960 

aacaggcaag agacaatttt taggggcgat ggaaactgtc attttgactg gggaggtcat 4020 

tacacctata catgcatttg ctaaaagtca tcaaactctt ccactgacgt gggtgtaacc 4080 

attgtttgta aattatacct caacaggatt cgattaattt atttattgag acagtctcgc 4140 

tctgtggagt gcagtggtgc aatcttggcc cactgcaacc tccgactccc gggttccagc 42 00 

gattcttctg tctcagcttt cctagtagct gggattacag gcgcccgcca ccacgtccgg 4260 

ctaattcttt tatttttagt agagacgggg tttcgctatg ttggccaggc tggtcccgaa 4320 
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ctcctgacct caggtgatca gctcgcctga gcttcccaaa gtgctcggat tacaggccgc 4380 

gcctgactat gattcgtttt gaaagaaaaa aatatatgcc atttacccct cgggaatgga 444 0 

aacatagagg agtgacaaga tctcccccaa gctctggggc ggtggagtcc agcatttata 4500 

gatcggtttg cactaggagc agaagctcct ttcacgacaa tctcggcctc ttcccacttt 4560 

gtagagtgag taacaagctc ggagagatga aataatttgg tggcgttaac ccggccgaca 462 0 

ggcgccagcg ccaggatttg aacccaaacc acgtgacttc acaacccata gctttcaaca 4680 

ctacgcggtg ctgctggcac ctagtaaagg ctggatcact actgaatgaa tgattctggc 474 0 

tacggatcct taaagcccac agaaggccca tccagagacc gaaagcttca gacacaagcc 4800 

gcagagcaga ccgctaaacc ggagctacag aggcgaagct cagacttgag cctgagtccg 4860 

gcgggctgag gggcgggctt tcgtctcggg aggcggagct gtctcgtcgc attcccggca 4 92 0 

agcttgaacc tcttcacttg ccgtagcgcc tgcagcagga agttgctcta crgcatgcct 4980 

taggtttccg ggtgagggtt gggctccttg gtaccatgtg ggaagcgctg tgaagagttg 5 04 0 

ttgccttcca agatataccc aaattcccag ttccaggtaa gcggcacaga gccgcttgat 5100 

gtggctgcgg atgggggcgg catatcgagg gagggtaaga gttttccgga tatctgcgga 5160 

atcagggttg aaggaaagcc ttggcgcggt cgccgctact gtaattagtt gttaacgctg 522 0 

ctgccatcgt ctttgcatct ccggggtcca caaatctcag gacacccgcg ttgtgtgtcc 5280 

atgacggtgc tgagtgcaga agagaattgt ttgtttacga ggcgccttat aatttcgtag 5340 

aaacttatca aagtgcttac gtttttttag cccgtgtcat taaaactccg ctggcgtgaa 5400 

agatgacgtc cttagcccag cagctgcaac gactcgccct ccctcaaagt gatgccagcc 5460 

tcttatcyag agatgaagtt gcttctttgt tatttgaccc taaggaagcg gccacaatcg 5520 

acagggacac cgccttcgcc attggtgagc catcttttaa cttagaaaag ctcttggaag 5580 

cgtttgtttt ctggatgtta ctgttttttt tttttyccct tsttttctct tctgtcccgt 5640 

cctcttcctt agcagtttct agcatgttga tgtatatttt taagggaaag agaacataac 57 00 

agtcaggtgc ttggtgcttg aaatgcttat gagtagaggt atctggattt cacagatgaa 5760 

gaaacaaaat tagagaggtc aactaatatg tcaaagagaa gaacagctaa aagggggatg 5820 

gaggcagtga ctggggtgac ggagaagtcc tctcagagga ctggcctagt tattgtaggg 58 80 

gatatccaaa aaaaaaaaaa aaaagtgttc cctgccacga agtactttct gttctagtag 5940 

atgagatagg atttactcta aaagttgaaa actagaaatc agtgtttggg actgtattat 6000 

aaattatatg tacttaaaga attcagaaac ttcttagggg taagtaaaac tgtaaaagag 6060 

gggtgggatg aatttgctta cgaagaattg ttaaataaac tggccttttt tgaaatttag 6120 

gatgtactgg cctggaagag ttgcttggaa ttgatccttc ctttgagcag tttgaagcac 6180 

cgttgttcag tcagctagca aaaaccttgg agcgaagtgt tcagaccaaa gcagtaaaca 6240 

aacagttgga tgaaaacatt tcattattcc ttattcactt gtcgccttac ttcctgctta 6300 

agccagcaca gaagtgtctg gagtggttga ttcacaggta gctaatagaa ttacagaaat 63 60 

aactatgggt taatcacttt ggtcttgtaa aaaattaagt agttgagaac ttatcaaatt 6420 

aaagctgaaa aattaggtat attaagagat tgatacaatg tccattgtgt gagataagca 64 8 0 

tgctcttctg cctttaaatt cttttttgtt gatagtgaaa agtagaacta ctggtgattt 6540 

tgagtaaatg gtctaacctc attagttttt ttttgagata gagtcttgct gggcactcag 6600 

gctggagtgc aatattgaaa tcttggctca ctgcaacctc tgcctcccag gttcaagtga 6660 

ttctcccacc tcagcctccc aagtagctgg gattacaggc acctgccata atgcctggat 672 0 
gagttttgtg tttttgtaga gatggagttt caccgtgtta gccaggctgg tctttaaccc 67 8 0 
ctgacctcag gtgatccacc cgcctcggcc tcccaaaata ctgggattac aggtgtgagc 6840 
cacctcgccc agcctaacct cgttagttct gaagagccca tgctactttg ccattacagc 6900 
ttgttcctgc acacttgagg cagtgagtac taagctgctt tcttggcaaa ataacaagat 6960 
ggttggggaa gcaggcaatg ttatcctctg aaataccatt acttgagaaa aacaatgata 702 0 
gctaatgtgt ttttgagcat ttatacacta ggccctggta agcactttat ctgcattatc 7080 
tcatttaatt ctcacagcat tcctctggtg agattatttg cattatcctt gttttacagt 7140 
tgagaaaact gaggcttaga gggattaagt ttcttgagtc acacagctag taagtggcag 72 0 0 
agctaggata cagctgaagt ttatttccaa ggtctgatct tttatctgct ttgggaattg 7260 
tctcagtaag cttagtttat tttctcacat attggtgtca catcacagca tacacatttt 7320 
tgttttattt attctcccaa tatgtatgtc tttttttttt gagacagtct cacactgtcg 7380 
cccaggctgg agtgcagtgg cacgatctca tctcactgca acctccacct cccaggttca 7440 
agcaattctc ctgcctcagc ctcccaagta gctgggatta caggcacata ccactacgcc 7500 
cagttaattt tttgtatttt ttttttttag taaagatggg gtttcatcac gttggccagg 7560 
ctggtctcaa actcctgacc tcgtgatcca ctcgtcttag cctcccaaag tgctggtgtg 7620 
agccaccaca gccactgcac acggctgtct ttttattttt attttattga gtgtcaggat 7680 
ttcactcacc caggctggag tgcagtggtg tgatcgtggc ccactcgatc tccagggctc 7740 
aaatgatcct ccgatctcag ccacccaagt agctgggatt acaggcatgc accaccatgc 7800 
ccacctagtt tttttttgtt tgtttgtttg tttttgtttt tgtttttcca ttttttgtag 7860 
agacagtgtc tcactttgtt gcccaggctg gtctggaact cctggcctca agcgatcctc 7 92 0 
ctgccgagtc ggcctcccaa agtgctgggg ttacaggcat gagccaccac atccagtcca 7980 
acttacatgt tttaaaagta gatttctatt ccattaattg gagacatggc ttcaaggcca 8040 
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gcctggccag catggtaata cctgtctgtg tgaaaaatac aaaaattagc tgggtgcagt 8100 

ggcgagcact tgtaattcta gctacttggg aggctgaggc aggagaattg cttgaatcca 8160 

ggagacagat tttgcagtga gctgagatca cgccactgca ctccagcctg ggtaacaggg 8220 

agacgccgta tcaaaaaaaa aaaacaaaaa atggagacgt ggggataaac taattttttt 82 8 0 

tcaagaagca agtgagtcat caagaaaaca tgattttaac ttgggtcttc tgattgtcag 8340 

aacattaggg caatcagaca aagaatattt aaaagtctgt aatatttcca tctgttttct 8400 

agcacctaac tttaccccca aatagatcat taatgtaagg aatacttttt gcatttgatt 8460 

tttcatttta tgtcttcaga taacttattt attcagcaaa tactaagttc agtacataga 8520 

taagtaggtt gccgttgatt actgttttga aataaatgcc ataataaagg attaagcaga 8580 

gttttgtgga tatagttctg cctggaatag ttgaagaaag cttcatggaa aaagtaaccg 8640 

ctaaaaccgt atgtcaaact aagaggctgg gttcagtggc tcacgcctgt aatcccagca 8700 

ctttgagagg ctgaggcagg ccagtcactt gagatcagga gtttgagacg agcctgacca 8760 

acatggtgaa tccctgtctc tactagaaat acaaaaaaaa attagctggt tgtggtgctg 8820 

tgcacctgta attccagcta cttgggaggc tagggcatga gaatcgcttg aacttgggag 8 880 

gcgaaggttg cagtgagctg agatcacacc actgcactcc agcctgggtg accaagtaga 8940 

ctctggaaaa aaaaaaaaat ctaataaaag gcaatgtatg gtactgttta aaaaaagtac 9000 

ccgatgtaaa aggataggag agttttcaca gaagagggat cagttggcag aattagaaaa 9 0 60 

aacatcttag ccaggcatgg tggctcacac ctgtagtcct agcactttgg gaggctgagg 9120 

caggtagatt gcctgagctc aggagttgga gaccagcctg ggcaacatgg tgaaacccca 9180 

tctctactaa agtaaaaaaa aaaaattagc tgggcatggt ggtatgcgtt tgtagtccca 9240 

gctacttggg aggcggaggc aggagaatct tttaaacttg gaggtggagg ttgcagtgag 93 0 0 

ccgagattgt gccactgcac tccagcctgg gcaacagcaa gactccctct caaaaaaaaa 9360 

aaaaaaaaaa aaaaaaaaaa aatgttgaca gaccagcctg ggcaacacag ggagacctag 9420 

tctctacaca gaaataaaaa attagccagc tgtgatggtg ctcccctgtg gtcccagcta 94 80 

ctcaggagac ctaggtggga ggatcatttg agcctgggag gttgaggctg cagtgacccg 954 0 

tgatcgtgcc actgcactcc agcctgagtg gcagagcaag actctattca aaaaattaaa 9600 

taaaaattgt aaagctgaga gaaagttaat gggatgtata ttttgatgta gaatattcaa 9660 

agatattgca taggaattgg ctggaccatg ttgagggata tgtgaatgca aaattgggga 972 0 

ctttttaaat gctctgtatc ttataatttt gtgattttta ctgctactcg ttctaggttg 9780 

ttttagctta ttaataatac agtttgggca tgtgaggaaa taggtgctta ttattaagtt 984 0 

ttgtgttcag ttggactgtt tgcttttttt ccccaggttc catatacatc tctataatca 9900 

agatagcctc attgcttgtg ttctgccata ccacgagaca agaatatttg tgcgagtcat 9960 

acagcttcta aaaattaata attcaaagca cagatggttc tggttgttgc cagttaaggt 1002 0 

ataattgctg aatgacatat gtctgtaatc attacagatc ttgattaagg gatttaatta 10080 

ggaaaaaata agtcattgct cctggggata gtccagataa ctacctccat catttttatt 10140 

tcctgacttc caattgaaac tattattaat gttaataata tttagctttt atttagtatc 10200 

tgacatcagg ctaggcaatg tacattatat tatcttggat tatcacaaaa cttcattttt 10260 

acagatgagg acaggctact tgccaaagtt cacataacta taaagtggca gagttgagat 10320 

ttgaacccaa atctgattcc agagtctgtg cctttgtatc gtaccatgcc gtttggcatt 10380 

ttaaatttgg aactgagaac ttaaaaaaaa gttgaaggtg aagaaaggca gaaaatactt 1044 0 

acaaaaatta aaaggataag gataacttca ggggtttcaa gaatttacta aagttgtgag 10500 

gaagatgcta ggagttaatt tacagactaa gaaaaggatg cttgtgtaac aggaagggtc 10560 

tgggtaagaa aagacagcat gctattgtag gaaaaccaac tgactaaccg ttagcacttt 10620 

gtttgcaaca tatggtggag gaaagtagat gtatgttgtt agcaggaact tgactgcatg 10680 

ttgggcatgg tgatccaggg taggggggga ggcaaatggg gatgaaggta agctaacaga 10 74 0 

tcacgaggct gtgcacatta ggctgatgat ggtagtgggg tcagtgttca gaattagtaa 10800 

gataggagta atgaggactt tttaattaga aggagaattt aagaatctga acgtaaagca 10 86 0 

cttgaactaa agaacgttgg agaaggccag cctttgataa agctcattaa ctgctcttcc 10920 

caacattcac tctgcccata gcatcaaaat tattcttcct ctaatctacc ctggatcttg 10980 

ctgctcccta ggcactctta ttccaggtag ccaaactcca ctcacctact aaagttcact 11040 

taaaattctc ctttcattta tttaatcagt atttatggag catctattac ataccatgta 11100 

caacgctaga tggtgggaat agtggtataa gttgtgcagt ttcctagtct atttctgttc 11160 

ttagtgttat tgtaaagcaa tgacagaagc gtatcatagt ggctcagaaa ccaaatcgaa 1122 0 

ccagtaaagt tactaagaat ctagagaatc cagattttga taaatacctt tcaaaatgtg 11280 

acatacgata agcaaacatc tttcactaaa tacctctgcc ctcattgctg accttctctt 11340 

ttgaatttct aagtttacat gattttatgt gttaatttca tcaccaccac cagttataag 11400 

gagagattat attttttatt tatgaattgt agcagtatga agacattact agggtaaatg 11460 

ctttaaaata aggatcagtt gatgaatgtt gctagtagtt tgttctttcc atcaataaca 11520 

gcaatctgga gtgccgttag ctaaaggaac tttgattacc cactgctaca aagatcttgg 11580 

attcatggat ttcatttgca gtttggtgac aaaatctgtg aaggtgagca gtctgtttca 11640 

tgagtatata attttatgaa aagattgctt .gctttgaatg aagaaaacat actaaaacat 11700 

tccctaataa caatgatact ttggataaat ttattttgtt aaatggtctg gtgttttgaa 11760 
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gcaggagcag tttgagagtc cgtatctttt ttttttttta agaatcagtc ttttatcacc 11820 

aaaggtgttt ttctacaaaa ataaatgtct attccttgcc agattctagt tacagtgact 11880 

attcaaagag agtgtctaga aatgtcagga atattcaacc tgggaaagct gtttaaaaaa 1194 0 

ttttaaggcc aggtgcggtg gctcacgcct gtaatcccaa cactttggga ggccaaggca 12000 

ggcggatcac ttgaagtcag gagtttgaga ccagcctggc caacatggtg aaacaccgtc 12060 

tctactaaaa atacgaaaat aaactgggca tggtggggca tgcctgtaat tccagctgct 1212 0 

cgggaggctg aggcaggaaa atcgcttgaa cctgggaggc ggaggttgca gtgacccaag 12180 

atcatgccac tgtactgcaa cctgggcgac agagactcca tctcaaaaaa aaaaaaaaat 12240 

ttttttttta aataaaaatg ttaggaatat cattaggcag ttaattgttg tcacattgtg 12300 

tattcattgt tgcaaaggta attcaggaga gctgtaaata taatttggcc tttcactttt 12360 

tttttttttt tggagacatg gacttgcttt gtcgcccagg ctagagtaca gtggtgccat 12420 

catagttcgt tgaaatctca gccttgaact cctggcctca agcaatcctt ctacctccct 12480 

ttcactgtta aatgtgtttt gtttgtgttc cttgtttcag gtttttgctg agtacccggg 12540 

cagctcagct cagttgaggg tgctcttggc tttctatgct tctaccatag tgtcggcgct 12600 

ggtagctgca gaggacgtat cagacaatat catcgccaaa ctatttccct atatccaaaa 12660 

ggttggcact gctgatgtgt taagtagatt attttgtact taaaggaatt ttcttgcttt 12720 

cgaaagtttt tttagattta agtgttttta aattgacagt ttatttcaga tgatagctga 127 80 

gatttagcct ttaggttgaa aatatgacac ttttttatta gaaactcact ggactgggac 12840 

cttaattagg actcttaaga ataaatattg gctgtctggt cctgcggcca tctcctagat 12900 

tgatttccat agcagtcttt gtacctcact ggaaggagga cggagcagac agtctctttg 12960 

aggcgtaagc agcctctcag tattctttgt gcactggctc ctgcctctca gcgtttctcc 13020 

ttcccaagtg ccttcttgcc tgctgccttc ccaggtgccc tgtggaggta ctgctttcac 13080 

ttcccaccag tgtcccaact tgtgaccttt catcagactt gtttcttcca ttagtgatct 13140 

gattgaggtc tccctacyat aagtaggatt ttatgtataa aagaagagct tactggctcc 13200 

tgtcaggaca tgtggtagat gtttgagttg ggaaattttc tgagatcctt tgtctcgttc 13260 

aacagacttg tctcatctct gtatccactc tgaaaaaggg gtcagctcct ttattgttta 1332 0 

tgtctgaaga gtgattgact atgcattagg ttgtattaat ctctatgaca tttctaattt 13380 

gtcagattaa catttaaagt agcagaaaat aatatggttt atcatttttc cttatattta 13440 

aaaaatattt agggattgaa atcatcttta ccagattaca gagctgcaac atacatgata 13500 

atatgtcaga tttctgtgaa agtgaccatg gaaaatacct ttgtgaattc attggcatca 13560 

cagatcatca aaacattgac caagattccc tctttgatca aggatgggtt aagttgcttg 1362 0 

atagtgctcc tgcagagaca gaagccagag agccttggga aaaagtatgt acaattgaat 13680 

tgagaaatgg tgctagtcag aggtgaataa aattattttg aataattttt ttttgtgaga 13740 

taagtgatta tatattttct tatattgtta ctcattgtct aacttgtaaa gtcaacacga 13800 

tatgtacttt tgcttcttca aaggccattc cctcacttat gtaatgttcc tgatcttatt 13 860 

acaatacttc atgggatttc tgaaacttac gatgtcagtc ctcttctgcg ttacatgctt 13 92 0 

ccccatctgg tcgtctccat cattcatcat gttacaggtg tgtggtttta tattttttgt 13980 

ccagaaattt tctaagattt gatcttaaaa tagtaaccat atcctggtta acagcttcaa 14040 

aatatttaaa atttctgttt tccagttgtt cttgtgtaac ttgtctattt cttaagtgag 14100 

aaacatctgg ggtggggagc aggttggttg aagaaagatc attgttaatt gagtaatttc 14160 

ttagaatttt acttttttta agatctgtgt tcttaaatac ttaaatagtc tctacatgaa 14220 

aaagactgga atacttttaa aatttattac tgagtaaacc tttgccttct catttaggta 142 80 

tttaatgaac tttagtgatt catttacaat gaatatctca tccagttgcc aaaaaagttt 14340 

tttccctaga gtaattaaaa atataagacc aagaaaattt ttatacataa aaatccaaat 14400 

tatgaaacaa agcaaaaagt aataattaga gggccaggtg gctcaaacct ataattctag 144 60 

cactttgaga aactgagatg ggcagatcat ggcaaaaccc cgtctctaca aaaatataca 14520 

aaaattaacc gggcatggtg acatgcaact gtggtcccag ctgctcggga gactgaggtg 14580 

ggaggatcac ctgagtccag ggagttcgag gctacaggga actgtgtttg tgccactgca 14640 

gttcagcctg ggcgacagag tgagacccta tctcaaaaat aataataata attagaaagt 14700 

gattatttgc attatgtctt ttaataaata cagaaaggct aggaagaaac atttggttct 147 60 

gttgctactt ggagaatact aagaaagact tctctaaact tatgtcagtc tactgaaaaa 14820 

tgaggttttt gtaaccaaat cttggtttga acatgcattt tagaaaagtg actagcaaaa 14880 

tgaaaatgta ttcttcctat tctgaaggta atgagtaatg tcccagtctt tgaaagcaag 14940 

ggctaagatc agattcatat gtttattata aatctgaaat agatgacagc ttttggttca 15000 
gtaagggggt gggtactgag ataaaagttt aacctctttt cagagatagt tgattggcac 15060 
catcacaact aatttgtaat gtgacctggg gaaagactaa ttttggactt ttgttttgtt 15120 
attgttataa gcatgatagt acttgttaaa aatagagaat tttagataaa agaaaaatga 1518 0 
aggtgatcta tacaaaagag ctttaaaaaa tataaatgac taaataagtg gaatgcactt 1524 0 
gtgcaaatag ataaaaagct ggatatcagt gtgtgtgttt tactctgaga acgattttct 153 00 
gctataagtg aatttaataa tgttagcttg caaaataaat ttttcccaac acaattttaa 15360 
tttttacata ttaaggagaa gaaactgaag gaatggatgg tcaaatctac aagagacact 15420 
tagaagctat acttacaaaa atatcactga agaacaactt agaccatttg ttggctaggt 15480 
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aagctattat ttttgacatg cttttgattt acatttttgt gactcagtat aattttcaag 15540 

attggttaga aatttcccag tattcctttt taatgtatat cttgttttgc ctgtatcatt 15600 

tcattaacta taagtactta atccagtaac tggcagcatt aaaagcaaaa gtaatttttc 15660 

ttgcatgtgt ttagtgctta tagatgaaaa tggaaagtta gtggttaaga caaatttggg 15720 

taataactaa atgcagtttg tggccgtttg aagttttctc ttagtaaatg gaccataata 15780 

tctgaaataa cacatcacaa ttaaaacgca gtaaatgtca tttagaaaca atattgaagg 1584 0 

taagcataga gtaaggatta tttctttaaa gataattcag tctttttttt tttttttttt 15900 

ttttctctta atgggcaact cgatacagtg aaaagatgta ggcgttgtgg tcagacatag 15960 

gtgtaaattc tgcgcctacc acttccaggg ctatagtaag ttttggaggt tagtctggat 1602 0 

gtagctttgc cattttgtag gatcaagaga gtgcctgtgt gttggtgtgc atcattcatg 16080 

cccctttgct cagagggttg gtttgtgaat taaaggagag tatacacgtg tagtaatgtt 1614 0 

gagaacatag gctctagaat cagactgggg cttaaaatgt gaaaccccac tctgtggcac 16200 

agaagggctt actgaatgtt gccttttatt gctgttctca tgagcaaatt ggaagactta 16260 

cactctagtt tcttttcctt tttttttttt ctttaagatg gagtttcgct cttattgccc 16320 

aggctggagt gcagtggcac aatctcggct cactgcaacc tccgcctccc aggttcaagc 16380 

gattctcctg cctcagcctc cctggtagct gggattacag gcatgtgcca ccatgcccag 1644 0 

ctaattttgt atatttagta gagacaaggt ttcaccatgt tggtcaggct gatctcgaac 16500 

tcctgtcctc aggtgatccg cccacctcgg cctcccaaag tactgggatt acaggcgtga 16560 

gccagcacca tttaagatag caatgtagtc ttgctgtgtt gttcagacag gtctcgaact 1662 0 

cctgacctta agcagtcctc ctgccccggc ctccgaaagt gctgggatta taatgccagt 16680 

gatgtgaacc actacaccag gcctctagtt tctttatctg taaaataagt attaaactat 1674 0 

accttgtata agtatgacaa ttagagataa tatttctaaa gcacttagaa tatagtaggt 1680 0 

actcaataaa tagtaactct tatgtatgct gatatttctg tttttttttt acagccttct 16860 

atttgaagag tatatttcat atagttcaca ggaagaaatg gattctaata aagtgtcttt 16920 

gcttaatgaa caatttcttc cactcattag acttttagaa agcaagtaag ttatgtgtgt 16980 

atgtttatgc tcttctaaag tacttcctgt tctataaaga tatgattcac aagtcacatc 1704 0 

ttaatatact gaattgtaca gagactgtcc tttttaaatt tgttcttcaa gaaggggtga 17100 

gtatcggaat caaaaatatt tgaaatataa gaggaaatag tggttgtgtg ggggctatgg 17160 

agagtataat tttttataga gagactattt tgttattgga gtagtcatag taacacactt 17220 

gaccaatgtc atttggtttt acctacaaca tttgttaaaa tttaagtcac agtctcagta 17280 

atttttaaga ttttatgtct ttcctttata tgtgtatggt gtatgagatt aataaattta 1734 0 

ttaagaataa tgaattcact ctttaattgc ttcccagata ccccagaaca ttagatgttg 17400 

tattagagga acacttaaag gaaattgcag atctgaaaaa acaagagctt ttccatcagt 17460 

ttgtttctct ttctacaagt ggaggaaagt atcaggtatg ttgttctcca aaggaattat 17520 

gacaatttgg ggtacatttg tatgggtatg tgaagagctc cagaaggaaa tttgccactt 1758 0 

ttttcccttg tgtggaaaat tttacagaga taaccttata cagtttgcct agttgcctga 17640 

agtacggctc tgaggcagga atagaataat tggagaaagg tacaataatt tgtacataac 17700 

tatgaacgtg atacttattt caggaaaagt attgtttgag aattttttta tgggatattc 17760 

attaaggaat catcagagga cggcctcatg taggtttcta gcattctctg tctttcatct 17820 

aggtctatgc tttccagtgt accagccaca gatggctata tattagatta gcaaggggga 17880 

gatgggttct atttcatgat cacattaaaa cttgagaaat tttctggtga ttcaagccta 1794 0 

aagctttata ctagaggtta gacatttaag caggagttct tatatttagc cttaaaaaca 18000 

gataattatg aagagaaata tgattgtttt atatttctct tccataatta ctttcatttt 18060 

taatggtgtt gcatttccaa atcttaggtt gatggataag actttctttt tagtctaatt 18120 

gcgtaagtga actaagttgt cacgaatttg agtgtcacta tttgttatat catgtagatg 18180 

agttgcattc tgtcttctgg aaggtgctaa aacagggatt tatcttaggt gatttatcag 18240 

atacgcatac aaaatacaaa attactttgt agaattactt gactaaatta aacctacaca 18300 

tagacacaag tatacatgaa atcataccat taacaataca aattttttca tctattgcaa 18360 

ctcatgtgat gttctagaaa caaagatttt gcattttttt gtttgttttc tgtagttaga 18420 

tggttgttac tgaaaatgtt gggcaatttg gggtggtttt ctctgggttt tattcttgat 18480 

catagactgt tcagattaag tgaaatcatt tgataactat gttgaaccct gtagttttta 18540 

gcagattctg atacttcttt gatgctcagc ctgaatcatc cacttgctcc tgtgagaatt 18600 

ctggccatga atcatttgaa aaagatcatg aaaacatcaa aggtttgttt cagatcgctt 18660 

ttttatatgg tttttttttc ttatgatgag cagtgtaatt ctgaataaga aattgaagtt 1872 0 

accacctaag tgggtaatga tacagatgag ttgaataaac ctcatgagag taccttagac 18780 

taatatttag gactgtgtat gccagtcgtc cccattttaa caataaccta ataacaacta 18840 

cttttatgtt tccacattga cagttgttta tctttagtgt ttttcaaact ggaatttgta 18900 

gattaattca cgtaggttca ttaattatct gagaattttt ttagttttga aatttgattt 18960 

tgggccggtc acagcagctc atgcctataa tcccagcact ctgtgagaag gatcacttga 19020 

gcccaaacgt ttgagaccaa cctgggcaac atagggagac ccccttctct gtaaaaaagt 19080 

tcaaaaaata aaaaattagc caagtttggt ggcacacgct ttggtcccag ctacttgaga 19140 

ggctgaggta ggaggatctc ttgggtgtga gaggttgagg ctgcagcgag ccatgatcac 192 0 0 
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gccactgctc tccagcctgg gtgacagggc aagactcttt caaaaaaaaa ccgagaaatt 19260 
tgattttgat agtcttaatt ttagatgaag acatattttg ggacatttga atgataactt 19320 
ttgaacactg ttccatgatt ttggtgcctg cagtttcgtt tgtctttaaa atgttagtgg 
tttcaaatat gtattgaggc tgactttatt tgtaaataat ttggtctaat ttgcatttat 
accaaaaggc attggtatga agacttggta atgatgcaca tagaatagta gtttccaaac 
tttagtcatt caagcagtcc ctgtacaatt tttgctaaat tacaaattac cactgatagt 19560 
agtattattt tctcacaatt tttctttatt tcagatggac tcattatttt tttacatagc 19620 
cttcaaattg tgggtgtgag gtgctaattg tatatgcatt ttactatagt tactataatt 
aaaataggaa tgctccatct gtgtcatatc taaaattatg caccgtgatt tgcatatttg 
agaaatttct ggaacagagt aaagcggtta acagagttta gtttccttgt agcacatggc 
agtgtaaacg tgccgaactc tagaacttgt gtcatagcaa ttagaagcag attcttacta 
caagtaaaat catatgatca gtaaaattta atttcagcta attacagcta atcacactaa 
gttcatcttc atttgaattt tattggtttt atagagtttt tagtatttcc cttgtagatt 
taaaaaaaaa ttttttgttt tatgtcatct atattttaag aaggagttat acttaatttt 
aagttgatat ataaacaaca aacaacattt agatctttga gaatgtttct ttgaaagtga 
ctgatgtaat attcaggttt gagaaatgtt gctttatttg atctgacatg aattctgctt 
ttgtaaactg ccaataggtt tattttgtcg atatgttaaa taaattcatt gaataatagt 
ctttgttgtg ataggaggat ttcagtgata agaaattctt ctagaataaa ggggagtcct 
ggaatcttct gtttctgaag ttattctcat ccttccttct tttactgcct ttctctcctg 
ttctctgggt ttggttcgtt tttcggtgaa gggccggccc tgcatacatt caacttacag 
gcttgaggat tgtaattgca gttgatgtcc tcctttgcat accatgtttc attttctttt 
ctccattgat gacaggacat aggttttatt ctaaactcca ggttcagata tatggctgct 
tattagctct tgaggatttg tttaaatgat ggggtcttgc tatgttatcc aggctggcct 
caaactcctg ggcttaagca atcctcccac ctcagcctcc agagtagctg ggattataga 2 064 0 
catgcaccac tgtgcctgac taacttaatt tttgtgaatt ctgtgaattt tcatttttct ?n700 
cccaacagag tgacctatgt tttttatatt atttagttgc ctgccacata tagataggga 
tatattcagg ttgattagca gtcctagttt tctttctctg tgaatgaata tacattgttt 
tcttttattt gttatattta aaaaatgatc tctatttttg gcagactggc tttgtagtac 
atgtccttat ccttgtaaaa atttagataa tactagctag cttttaacag tactatgtgt 2 094 0 
aaagcactgt gctacagact gacatgaatt ttttaatctt ctaacaaccc ttgcaggtag ?i™n 
gtattattta ttagcaaacc tgttttatag gtgagaatag tgagatccag aaaggttaag 
tgacttgact aaagttacac aactagtaag tggtagagct aggatttaaa ttagacattc 
ttattcttta gtctatatcc cgtcaatatc tgggctctca ctttaataag atgaactcct 
taactacaaa aactatattt ggagtatacc atatttacta tattttcact gaaaacttga 
aaaggatttg taccaaacaa ttattaatcc aaaaggccct ttagcgatta tctagttcta 
gtgtgtttca gtcctcatcc agagcagagc ctgtgttaga ctgtcattct ggagtgtttt 
tcacaaaacc atacttcctt tctagtgcaa agccttaaga gatttttggc aacagctaaa 
ttaatgtttt tcttttattg ctcaggaggg tgttgatgaa tctttcataa aagaagctgt 
tttagcccga ttaggtgatg ataatataga tgttgttttg tcggctataa gtgcttttga 
ggtgagtgaa tttgtcattt gttgaggtat acaattttgt aaatttcctt gcccttctca 
cttgattctc tgcttgagac tagaatattt ttggaaatta tttttccatg ggaaatttgt 
tctctggaga gcacagactt aagtctggtt aggagaaact cacagaaatt cacaatttga 21720 
tatattaaat atatatggaa gttgttcaaa caggaaaaat gttttgtgaa tttttatatt 
gcttttaaca tatttttatc aatagcaaat accttgggac ctgtggaaat tgtcttatga 
atgccacttt tgagaaatag atcttaaaac ttttaatttg tactaggatt cctgggagaa 
attccttata attcattctt tttaaaattc gcatttaaaa agcgccatac atgtttgaag 
caagtacaaa tctactctgg atattttttt ccctttagat tttcaaagaa cacttcagtt 
cagaagtgac gatttcaaat cttctgaatc tctttcaaag agcagaactt tcaaagaatg 
gagaatggta tgtattcatt tctcctcata ctattttgaa ttgatgaagt tcttatatca 22140 
agttctttct atttcttctt cttttttttt taaggctttg attacatgac ccttttgtcg 22200 
tggtagtaga acagatctta ctgtaatact ctctttcatt atacagttaa ttaatttgtg 
tttgtgtgtg ttcattatac tgttctctat gtttggaatt tctcttaata aaatgcaaaa 
ggaagtatat aatgtatgat attgcattct gggaagtaag atgggtagga ctttctaatt 
tgctgcttta tgtaattttt actttataat taaaaacaaa taccaatgtg ttaaacaaat 
taaaaaataa gttcagagac acatgatacc atctgtggta agtttttaat gagctctgtt 
tatttcattt ttggcaccaa agctttttgc aagaaaaagt aagcaagctc cataggggat 
gagagtgtgt cttacgtgtt tttgtatcta tcgtatgggt agatatctat catgtaggat 
gtgcttggtt catttttttt gagtgcctga gtgaacctga tttccggtag agttgagaca 
tctagtgtga tatgtcttat gttatagtta acctctagtg atttctctaa gaaaaattat 
tatttttttt ttaagattag acaggatgag cagattaaaa gtattgttgg ccaggcacag 
tggctcatgc ctgtaatccc agcacttgta caaaaaataa ataaataaaa atattgtcac 
agataactct gttattggca tggattttaa gtggtgtctt agtgtatctg aataatagtt 



19380 
19440 
19500 



19620 
19680 
19740 
19800 
19860 
19920 
19980 
20040 
20100 
20160 
20220 
20280 
20340 
20400 
20460 
20520 
20580 



20700 
20760 
20820 
20880 



21000 
21060 
21120 
21180 
21240 
21300 
21360 
21420 
21480 
21540 
21600 
21660 



21780 
21840 
21900 
21960 
22020 
22080 



22200 
22260 
22320 
22380 
22440 
22500 
22560 
22620 
22680 
22740 
22800 
22860 
22920 
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22980 
23040 
23100 
23160 
23220 
23280 
23340 
23400 
23460 
23520 
23580 
23640 
23700 



ttggtcagga gaccaggata ttgtcagtta agctggagta atgtaatgat aaatttttgc 
ccaattgaag ccttccaatg tgtctttcaa ccaagtgtgc ttgcatgtct tttaggtacg 
aggtacttaa gatagccgct gacatattaa ttaaagaaga gatactgagt gaaaatgatc 
agttgtcaaa tcaggtggtt gtatgtttgc tgccatttgt ggttatcaat aatgatgata 
cggaatctgc tgagatgaaa attgctatat atttatcaaa atcaggaatc tgctccctgc 
accctctatt aagaggctgg gaagaaggta aaaaattcag ttgttttttt agaaaaagtg 
aacagataac acaaaagaaa acaaggaaaa tttaaaattt attgttggtc atgctatatg 
ccagtaaggt cttgtaaaat ttttgccaag ataaaaagta tacagtgaaa aggaagccaa 
ttcatagaca aattcagaat cggatacaag gaaaaataaa aagcaaacct tcataaacaa 
tagtaaaatg tttgcctgaa ttcttcatgt gattctttga gtgttggtga cataagagct 
ctctgcattc gattttactt tacagctctt gaaaatgtaa ttaaaagcac aaagccagga 
aaactaatcg gtgtagcaaa tcagaagatg attgagttgt tggctgataa tataaattta 
ggagatcctt cttcaatgtt aaagatggta agtatgcttt gaaagtccac cccctggatt 
tctttctcat actcttatta aattctcagc ttttgcttta ctagattttt cttaaaaaaa 23760 
wtttytttta ttcttgagta cttggtttta ctttaaacgt tgcagtgttg tttaatatat 
tctgtgcaat gtttgaaatc ttttaacatt ttatatattt tttcaatgtg gatccatatc 
acacccataa gataatatat cctttttacc accatttttt ctcattccca tttaaaatat 
tagtcttttt gttgttgttg ttgttaaaat agcaaaatat aatcccaaag aggagaaaat 
tatctttgcc ttattaaata aacctttgta aaggattaga tgtgtgattt aaatacatta 
accagttggc aagtgcaatt atgttcatta cagtttctga aattattatt ataattactt 
agaaatcagt gcttagtact ttataaaggt tttctatttg tcatctcact ttaaatgtaa 
aacacataat acctagatag acctgagttg aaaaaattgt tttatgaaaa tcaagagcaa 
ctggtaatag agtgtgttta tttttcaagg tggaggattt gataagcgtg ggtgaggagg 243 0 0 
agtcctttaa cctgaagcag aaagtaacgt ttcatgtgat cctgtctgtg ctcgtctctt 24360 
gttgttcatc tttaaaagaa acccactttc catttgcgat aagagtcttc agtttgttgc 24420 
agaaaaaaat aaagaagctt gaaagtgtca ttactgcagt ggtaaggaaa gcagaatctc 
cagaaagtgg gatgttgaat aaaacaacag cctggcctat tagtcacagc aaaactgaaa 
atcagaggcg agatgttgag tctgtggata gtagctggat gaaatgtgaa gcagtcttgg 
gtagcagtga gatagtcaac taggcactcc cttgcttttg cggaaagttt agttgaatta 
taaaatttta gggttgcaat attttgtatc tgatgcctga ctctcctctc ttttattgtc 
tttattggat aattacccaa gctacaccta actacctcta gcgatgaaga attcacttgg 
cgtgttagtg cttgaacaaa atactatttc attcaattgt ttactaaaaa tagatctttt 
aggctttaaa taatgttaat attacataga tgtgtatccc tccagcctac tactatcttt 
catagtatta ctaagggaca tgtttcagct gtttacattt gacatgtttt gaggtagcaa 
gacttttttt ttaatatctg ctatatgtaa ggattttgtt tttatttgtc ctccaaattt 
tagagggctt ctcattatag gatttactaa atgaattata tctttttcac atttacgtat 
acccgcagga ttttataagt tctgatcgtg gccattgtta gctaaaatgt gtaaacttta 
aaaagagtaa agtcttcata ttttgaaaaa tagtccagat aggcccttct taacttttag 
tttcttggtg tcttttgtga agtatgatga cctgaactat actggttctc agtgttaaaa 252 60 
gtatggacaa atagagtgtt ttattataca tatgtagata acaggttatt ttatattgct 25320 
ttacttttaa tacctttctc gaggtcatga gcatgttttc gcctgtagca gcatgttgag 25380 
ttaatcaaat ccctcactgg agacgttaca ttgtaaatct ggctttggat tattttcctt 
ctaagttatt actctgtact gcttgtacag agtcacattt gtcattttct tacacctgtt 
ctgcattctt ttggcactta cagtataatt tctaaacacc taacattttg ctgcttgttt 
aaaatcatct gcaggtttgg agatttttgc catgtacctt attcagatta gtggtaaaac 
agtgaaatga aagaattccc tttaactatg tcttggcaat ccaactgttt atcttcctgt 
tttgaaatat acctattttg gccaggcacg gtagctcacg cctataatcc cagcattttg 25740 
ggaggccgag gcgggtggat catctgagga caggagttcg agaccaacct ggccaaaatg ?^«nn 
gtgaaaccta atctctacta aagatacaaa aattagccag gcatgatggc gggcacctgt 
aatccttcct actcgggagg ctgaagcagt agaattgctt gaacctggga agcggaggtt 
gcagtgagct gagattgtgc cgctgcactc cagcctgggt gacagagcaa gactccgtct 
caaaaaaaaa aaaaaaaaga aatatgccta tttattccta ccctttattg gataaacagt 
tcatatgaat aaagtttgtt tttagagata atgtttcaag gtaagacata gataaaattt 
aactttgtct ttccttgtga atagtgataa acctctttaa aaatccgacc aaagcagttc 
tgttaatcct ttccccacca aaatgcacac ataaacaaaa cttttgcata ttatgcagag 
gaattcatat aacccatgtt cagaaccctt agtttgtact agtttttgtt tcctttaaca 
tttcaggaaa tcccctcaga atggcacatt gaactgatgt tagacagagg gatcccagta 
gagctgtggg cacattatgt agaagagctc aacagcactc agagggtggc cgtggaggac 
tcggtttttc ttgtattttc cttgaaaaaa tttatttatg cactgaaagc tcctaaatct 26460 
tttcctaaag gtaagataca agctgtgaat cttttaaatg tgatactgtg caaaactgaa 
tgattgactc cacaatgact gtcttgaata aaaagctccc agcatgttca tgtaggatct 
agcatcattt ctctttccat gtccttgtag gtgatatatg gtggaatcct gaacaactga 



23820 
23880 
23940 
24000 
24060 
24120 
24180 
24240 



24480 
24540 
24600 
24660 
24720 
24780 
24840 
24900 
24960 
25020 
25080 
25140 
25200 



25440 
25500 
25560 
25620 
25680 



25800 
25860 
25920 
25980 
26040 
26100 
26160 
26220 
26280 
26340 
26400 



26520 
26580 
26640 
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aagaagacag cagggactat ctgcacttgc tcattgggct gtttgagatg atgctcaatg 26700 

gtgccgatgc tgttcatttc agagttctga tgaaactttt cataaaggta caagcctcct 26760 

ccttttcagg cgtactctac ccttaaagaa gatatggctg gcctgcaatc ccagcacttt 26820 

gggaggccga ggcaggcgga tcacttgagg tcaggagttc aataccagcc tggccaacat 2688 0 

agtgaaactc catttctact aaaaatacaa aaattagcca ggcatggtga tggggcctgt 2694 0 

aatcccagcc acttgggagg cagaggcagg agaattcgct tgaacctggg aggcagaggt 27000 

tgcagtgagc cgagatcaca ccaccgcact ccagcctggg tgacagagtg agactctgtc 27060 

tcaaaaaaaa aagataggct gggcgcagta gctcacgcct ataatccgag cactttggga 27120 

ggccgaggtg ggtggatcac gaggtcagga gatcgagacc atcctggcta acatggtgaa 2718 0 

acgccgtctc tacgaaaaat acaaaaaaaa ttagctgggt gtggtggcgg gctcctgaag 2724 0 

tcccagctac tcgggaggct gaggcaggag aatggcatga acctgggagg tggagcttgc 2 7300 

agtgagcgga gatcgctcca ctgcactcca gcctgggcga cagagcgaga ctccatctca 27360 

aaaaaaataa taaataaata aaaagatact gtgggttatt tgggttatga cagtttataa 2742 0 

tgtaaacaaa agtagacttg taaacagtga agagtatata atagcacggg ccctgtaaaa 27480 

gagtacaatc acaatactaa tcggtaggtt gaatagaaac taacattgct acactcctac 2754 0 

taatatgtta ggcccattac tatatttaaa aataacaaca gcccagtctt tttgattggt 27600 

taataccatt taaaaatgtg aatactggtg ttcagagagg ttatttagca cacgtgaagt 27660 

tactcaggaa atttcatccc cagcaccagt gacctccatg ttgccaggcc cagtggtcag 27720 

ttcctgttct tgcctgactt gttctctcta cagcatttga catggagttg gcattgctca 27780 

gcacattgtt gaagtatttt cttaattagg cgtcccagat actgctcttg tcttggttct 2784 0 

gtgttttcaa gtgcaaggtt atagtgaggc ttctgaatat cactcttcat tgacatcttc 27900 

ccagttcatt aaaacatgga atatggarta atttgaaatg attagacaaa gcaaaacttt 27960 

gccttaaatt atttgtttct ttataattgt acactagtga gaagaggatt tggaaccagc 2802 0 

ttatttgagt aagaattaaa agtttgcttt tttttttttt taattaggtk catctagaag 28080 

atgtttttca gttattcaag ttctgttctg ttttatggac ctatggttct agcctttcaa 28140 

atccactaaa ctgcagtgtg aaaacagtgc tgcagactca agctctttat gtgggctgtg 28200 

caatgctttc ttctcagaag acacagtgta aacaccaact ggcatccata tcttctccag 28260 

gtattaaaag tagcattgtt atgttttgtc atgaccaact gttcatgctt tctctatatg 2 832 0 

gtttatgtag tttggaaaat tataacttaa aaaaaaaata gcagagggca ggcgcggtat 283 8 0 

ctcacgcctg taatcccagc actttgggag gccgaagcag gcagatcatg aggtcaggag 28440 

atcaagacca tcctgggcaa catggtgaaa ccccgtctct actaaaaata caaaaattag 28500 

ctgggcgtgg tggcgtgtgc ctgtaatccc agctacttgg gaggctgagg ctggagaatt 28560 

gcttgaacca gggagttgga ggttgcagtg agccgagatt gtgccactgc actccagcct 2 862 0 

ggtgacacag cgagactcca tctcaaaaaa aaaaaaaaaa atcgataaag gccaggcagg 28680 

tggcttacac ctatagtctc agcactttgg gaggctaagg caggaggatc acttgaggcc 2 874 0 

aagagttcaa gaccagcttg gacaacgtga tgagatcctg cctctacaaa aaatacaaaa 28800 

attagctggg cgtgatggta cgtgcctgta gtcccagcta ctcagttgcc tgaggtaggc 28860 

agatggcttg tgcctgggag gtcaaggctg cagtgagctg tgattgcaca ctgcactcca 28920 

aactgggcaa cagagcaaga ccctggctca aaaaaaaaaa aaaaaaaaaa gggctgtaca 28980 

tctgtacata tttctggtca cttctaagta ttgcttttaa gttactggca ctagcaaaat 29040 

ctcaccaaac acaaaagttt aataatagaa atggcaggtg attatatgag ttcttaataa 29100 

tcagctattc attttctgag ctgtagcagc aatgggttag atatggaagt cagatcatca 2 916 0 

taatacaatt taataatgag ctaacaaaaa caaaaccctc ctgcatttat ctcttttatc 29220 

gggcataaat catattcagg ttcggcattc tctaatgaaa taactgaaag ttaagttttt 29280 

ttataatgta gaattgcaaa ttttgcaaaa gatgcagaat ttcatgaaac aatgaatgcg 2 934 0 

atattgaaaa gtgggttgac ttacagagct attgacatga ggggatgtgg cagaattaga 29400 

tcagagaaga atggttgaag atgaaaaaat tgattagaat aatggacatg attaatgctt 2 946 0 

caaactgtaa tgttattaat atcagaggca taggagtggc cctggggaaa tttgaagatg 2 952 0 

tccttggttt ttttttgttt tgtttcttgt ttttttttta attccaaaat tactctttgt 29580 

aaatgttctg tgaaagcaag ttagatgatc gtatactttt ccctgaacaa ttcctcagtg 29640 

tttatttttt gttagttctt tgaaagagca cataattaca tttgtaacca aattttgtta 29700 

catttaaagt agttatgtaa tttcaagttt tatttctaaa ataaaattcc acaaaagtaa 29760 

ttttttattc gcttacaaaa tttcggtctt tattttaggt gagtttactg ttaaggtttt 29820 

tctagttata tttggataag aacaactcta tacatttttt aactttaatt tttcaattaa 29880 

tatggacatc tttccatatt aataaatgga aaaatttatc tttttacaac tgcctgcaga 29940 

gtattccatt gtatggagat agcaatttat taaaatagtt attttctgct aaacttgtgg 30000 

gttttcaaaa aggtacatat attttattta gagataacta tttctgtttt ccaatttggt 30060 

rttaaatgtt tactgtcctt tgacagtggc ctctgagtca tatatgtcat tttgactgta 30120 

aatagcaaga tggaagaact tcagttttcc attggtattg ccctttgtgg gcatgcttgt 30180 

gtggaagaat ttgggctttg tgaaaaagat acattgcagc tgcagggggc tgggagggtc 3 024 0 

tttttttctg tggggacact gaataaacag ttgtatgcct gcattttcac tgcagcccgt 30300 

cacatgaggt caggtgtgga attttccact tgtgtcatat tggtgctccg aaaatttgag 30360 
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attttggaat ttttggatca aggttgctca acctgtatga acttcttagg caggtttgaa 30420 

accttgtcaa gctttatctt agtatttcag aaatggaaag tataatggaa aatttctagc 30480 

aaaatataaa tgaccgtagt ttgctgcagt agatatgaaa taaagcttaa ataactaatt 30540 

actgtaaaag gaatttagat atctctttgt tccaaagaca tctttttaaa aaccctagcg 30600 

cagttggtct cacagatgca ttcctactag atctttgaag cagatttttt caatgttgtt 30660 

gaactgggca gggcatagag aagggaagct tccagattct ttatttaaaa ctggcataac 3 072 0 

attgaaaatt gagaaagcct acagacaaaa actataggtc atgttgcttt gaataatatt 30780 

atgtaaaaat cctaaagaaa atattcgcaa ataggctcta gcagcatatt tataaaaatc 30840 

atttacctag ctagagttca tgcccagaat ataaaattgg ttcagtaaag ggaactatgt 3 0900 

taatataatt gactgtatca gtgaaggaca aaacccataa agtgtccaaa gatttgaaaa 3 0960 

ggcattagcc aaaaatcagt aatcctgatt gacagcagaa taggagtaac atgatcaaaa 31020 

attcctcaaa tcaaaagcta gcgttttcac ttcataattg ggaaaacaaa ggcattcctg 31080 

ctgaaatcat cactctcatt gaacatgatt ctagaagtac tagccggcag aaatagacaa 31140 

gtttcaaata cttatggaat gtatgaatat aataaaggca aaatcaaaga tctgtgagat 312 0 0 

ggttatgtgg attgttaaat gggacatctg gctatccatt tggaaacatt aggaagtttg 31260 

actccaacct cattccttag atagaaattt catgtgaact aaagatttaa cataaactta 31320 

gctctggcat tagaagaaag catgaggtac atatgatctc tccctctaaa tatagagaga 31380 

ggtactttct aaattcactt agagtgagga gggcctttct aagcaccaga aaaaaagtag 3144 0 

aagaaaaact tgactctaac atttctctat caaaaagtaa aatcaaaaca caccagtctg 31500 

ggaccaaata tttgtagagt acgatgcaca ggtgggtaat ttcccgtgca cgcttgtaca 31560 

aggctgtatg tgcagcggtg tttgttgcag cattgttttg atgacaaaag aatagaaaca 3162 0 

acctaaactt taccagtaag gtaccggttt ttaaaatatg gaaaatacat tggagtatac 31680 

aattaaaaaa taatgatttg gccgggtttg atggctcaca cctataatcc cagcactttg 31740 

ggaggccgag gtgggcggat cacctgaggt tgggagttcg agaccagcct gaccaacatg 31800 

aagaaacccc gtctctacta aaaatacaaa attagccgga tgtggtggtg cgtgcctgta 31860 

atcccagcta cttgggaggc tgaggcagga gaatcgcttg aatccaggag gcagaggttg 31920 

cagtgagccg agatcgtgcc attgcactcc agcctgggca acaagaggaa aactccatct 31980 

gaaaaaaaaa aaaaagaatg atgtaatttg gatatgtatg tacttatgga aaacattcgc 32040 

aagagtttag atgtgtctgt aatcccagca ctttgggagg ccaaggtggg tggctcactt 32100 

gaggtcagga gttcgaaacc agcctggcca acatggtgaa acccagtctc tactaacaat 32160 

accaaaatta gccaggcgtg tggcgtgcac ctgtaatccc agctgctcaa gaggctgagg 32220 

caggagaatt gcttgaaccc aggaggtgga ggttgcagtg agccgagatt gcaccattga 322 8 0 

gcaatgtcca gctcatatgt gatagcagtt ggtgtagaac agagtttata gtacgctctt 32340 

atttgtgtat tttcaaaaga atgcaccagg taacatatgc ctaacctttc tctggaatat 32400 

taaatattca aacagtgaac agagaaggag tttgaattcg aagggaaact ttttatcaag 32460 

tacctttggt actatgtaaa ttttgtactg tttcagtttt taaaattgtt tcttaaattt 3252 0 

ttttttcctt tgattttagt ggtgacatct ttactcatta acctgggaag ccccgtaaaa 32580 

gaagttcgta gggctgccat tcagtgtctc caggccctca gtggagtggc atccccgttt 32640 

tatctgataa tagatcattt gatttctaaa gcagaggaga tcacttcaga tgctgcctat 32700 

gttattcagg taagctcgtt agcaaagaag attgagtgtg ttcatgccts tgtgtactgg 32760 

agcaattttc tccattctcg acatttcact gtgtagggat tttgatggac taagtcattg 32820 

gatttaggaa ctgataggtc aatatctgta catctgtgtc cccctacttt tgaaaagata 32880 

gcaatgtcat atttcaattt tgttcttttc attggataaa ggattagtgt gtagcttttc 32940 

atagtaagcc agctgtgtgt ttttaataac gtttcttatt aaggttttaa ggagaaatgg 33000 

aaaattataa acaatgtcac atcatttggg taagagtctt ccgtaattaa tatatttagc 33060 

ctttgtagtt tcttaaccct tttttatcca attttctttt tggtatctta ggatttggct 33120 

actttatttg aggaactaca gagagaaaag aaactgaaat ctcatcagaa gttgtctgaa 33180 

actttgaaaa acttacttag ttgtgtgtat agttgcccat cttatatagc aaaagatttg 33240 

atgaaagtac ttcagggagt caacggtgag gtatgtgcca tttaagtgga ttgtacattc 333 00 

ctgtgatgtt actaaagaat tgttgtaact gcttgtgggt gaagaagaca attgtaagag 33360 

ttgcatgtgt gcactcagaa gagataaatg taatattaat ttgagagggg gatgagttga 33420 

ggtagagaaa accagataca agatacaaag acaagtactg gtgagtttct ttttatttta 33480 

aaattaagtg tgttgtccct gtctaatcat taagtttatt ttaaatgaac ttaacagtct 33540 

cgtgtctcat ggggctgcac tgctgaaaga agctgttgct tcatctctgt ttttcaagtg 33600 

aaaattctat acaaaggaat atatttggac acgagtctat gaaaattcca aaatcatttg 33660 

ctcgtttatt atatatattt agagctttca ttgtgttttc gaagaagctt gtgttacaaa 33720 

agaaagcagt tgtgaaacac ttgttgatgt gatggcatgt gctgaccaac ataaaaaagg 3 3780 

ttctttcatc ccagtgttac agaaatgcct gtgtgcttca tcagtcctag actgactgtg 33840 

ggcttagaga actaaaatga gccaaatgga ttgaagacca tttgattaca ctttcttggt 33900 

aggaacaaac aggttgcatt cagcccttag ttgtccctgg tatagtttaa attaataatt 33960 

gacaaatttc ttgggtaaag gaccagatag gaaatatttt atgctttcgg ggccacacag 34020 

tctctgttgc agctactcat ctctgcagtt gtatcaggaa agcagccaca gacagggtgt 34080 
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aaacaaatga 
aatttgacct 
ataatttgaa 
ttagggaatc 
aagtcttgag 
tgagcattac 
tctccatgtc 
ccagcaggct 
tactactaag 
actgctagaa 
gcatctcact 
gagtctagat 
aaccattcag 
catgttcaag 
tattttattt 
ttgacccatt 
ctgacaaaat 
tctggaattt 
aaaaccattt 
gtttgattta 
taaaggggta 
agggacacca 
attaacaaaa 
ttaccattat 
gaatatgtgt 
ggaataggaa 
aatagttgtt 
ttgaaacctt 
atacaaagtt 
tccttctttt 
ccctctgtac 
taaaaaaaaa 
ttatcacaaa 
tgatgttaga 
ggggcagtaa 
gaatatacat 
tactagttgt 
atactgtatc 
tatagaaagc 
tttatgttgg 
gtttttcaag 
tatccagtaa 
ttgctctaat 
tttccgttaa 
tgggcacagt 
attgagcaca 
tactcatcga 
cttttccaac 
ggaagttgga 
aaagaagctc 
atgacacttg 
cattctcttt 
tataggtgaa 
agtgtcctgt 
ttcagtagcg 
agatatagat 
gtatatattc 
gcgtgtgctt 
aggctaatgg 
gagaaaatta 
ttttagttat 
atgtaacaat 



gcatggctgt 
gcaagccata 
ttaatattaa 
ttaaacttta 
gaatgctcac 
acatgaactg 
taagaagtta 
ttgagtgtct 
tctttggatc 
aagatccaga 
ctgggaaagt 
atatttataa 
atcacagccc 
catttaaagg 
ggagaagata 
tttgaaaact 
taagtgggaa 
aaacttgttc 
tttgcagcca 
ttggtgaact 
agctgcaaac 
tgtttaagag 
atcatggaaa 
tttattttat 
gtgtgtgatc 
tgagtcaatg 
taatgacata 
gttattatca 
ctagtatcta 
tctttgctga 
ttcagtgtgc 
ataattcctt 
agtatttttt 
accttataaa 
atctgttttt 
atctgatccg 
ccactctaag 
ataatcaagt 
aagtgcatgg 
ctagagcttt 
aagtctgtaa 
gtcattcttc 
tttatcccaa 
tgctgaacaa 
tcagcaaaaa 
tgctgttgtt 
tttgctataa 
gttttgtctt 
ggttcttact 
agaagtcctc 
gctttgatct 
tcaacatact 
attaaaagat 
gtcatgtagt 
tgcctgggac 

atgtgtgtat 

tatatacgta 
tcatgtgaaa 
aaagtgcaat 
gagttagaaa 
tccttattga 
gaattgtaca 



gtttcagtaa 
atttgctggc 
ttataccaag 
tttttattct 
agattatctt 
tcagaaagga 
ccatcattgg 
ctgtagagtg 
tttagatggt 
aggagcccac 
ataatgaatt 
aagctgtgca 
ttgaaaaggt 
tagaaggcaa 
gttacaggtg 
gaacgttgta 
acaattttgt 
actgaactca 
tatcagatga 
gtaaaaactc 
ttcttgtaat 
tcctgtattt 
agtaagtcag 
gaagtatttg 
tttctcatat 
gctcagtatt 
taatggaatt 
gaatagccac 
gattcagcag 
aatattttaa 
atcttttgaa 
ggtattatct 
atagttgttt 
aagagatatc 
gtgcatgtgt 
gcacttggaa 
ggatgagcgt 
aaagtgcagg 
tagtaggaga 
tagtgtatac 
caaggtagta 
cctcttaatt 
tggattatat 
gtccgaatag 
agaaggcaaa 
ccctgactct 
tcaggcacaa 
ctccattttc 
ggcaaagagt 
agatattggt 
gtgagagaat 
cttctcttgt 
tggttttagg 
aaatggccac 
gtcattactt 
atagtgtttg 
gatttgtgtg 
ggcaaagtaa 
agagaggacg 
tcttatatct 
atgagttcct 
acttttacat 



ataaaattta 
ccctagttta 
ttggaaacat 
ctaaagtttt 
tctgctcatc 
cctgagtgcc 
gttttttcct 
agcaaaaaca 
gctttctcag 
agctgtgctg 
ttcagtttcc 
cacaacaaag 
cagtgatttc 
caggaacagc 
gactagacgt 
catgggtcat 
attttgagag 
gttgcctttt 
aaaagttcag 
acattgtgct 
tttttttatt 
tatagtgctt 
agagaggcat 
ttgataattt 
tagacggaat 
gaactgggta 
gaaattctgc 
ctggttattc 
ttctcagaat 
agtaaatctc 
aagtggatat 
aatacctagt 
tctttgtatc 
tggatcccac 
gtatatgtct 
acaatcggta 
gaaattgagt 
gctgtaactt 
gcttggttaa 
tcttccctac 
gcatccaatg 
gaataaatcc 
catcaactct 
aactggagcc 
aaatgcagca 
gaatgttatg 
tggtttttac 
cagaaaatca 
aactctcatc 
gccaactctt 
aggactctaa 
gttagatagt 
tatccattgc 
tcactgtaca 
gcatttcaag 
ctatctaaat 
tgtgtgtgtg 
ttctgaaggt 
tggaatttga 
tatttctaga 
tattgaatga 
ttcccatcat 



caaaaacagg 
gcagttaaag 
tgaaggacac 
agtttgtgtt 
tctgaagggt 
ttagtgtgaa 
ccaggctaaa 
tatcagtata 
ctattgccta 
aaagatgagg 
cttttaaatg 
gaactttacg 
tttcagaaac 
gttaaaaaaa 
tttaaaggaa 
taaaaagaat 
aggttgatat 
ttctaccccc 
cagaagcttt 
cagactgtca 
aatgtttata 
gtaagaggtc 
aaaataaaaa 
taaccagata 
ctacctattc 
acattattct 
ttgttggttt 
tcttttttga 
tttaacatat 
ataatgtcat 
gactacaatg 
tcataatcat 
agaatccaaa 
cccatatcta 
tattccttcc 
tggttgattc 
gactctggac 
gagactccca 
aatgaaaatc 
tttgctttct 
ctgggggggg 
ttcctagggg 
tcttccttca 
accagataaa 
gaagtaagag 
aaaccattgg 
ttagcacatg 
caagatctag 
ctggaattac 
tttaacttgc 
ttcttgccca 
ctgtaaaata 
tggtgcccca 
gcctactttc 
atatagtgtt 
atattttcat 
taaattgagt 
tgtcctgtaa 
catagcgcct 
cttgatgttt 
gttctggaca 
ttctcttttg 



tggtctgttg 
atttacatac 
atacagtgtt 
ttctgttctc 
ttagcacttt 
ttaatagctg 
ctcccatctt 
cataactatt 
tggctgaaca 
ccatggttct 
aggatccgaa 
cgggaatgcc 
taaggcatag 
aaaagattaa 
aatgatgtat 
tgcattttaa 
aaaaattatt 
tttagattac 
taagaatgtt 
gcagtgtttt 
gtctcttcca 
agatgttacc 
ttttaaatct 
ggatgtgtag 
ctttatgcaa 
acttaattta 
tatacgtata 
atatagaata 
taatcttttt 
gcactttcat 
ccattagaag 
atttccatgg 
gatttagttg 
aagaatctct 
tagctgatta 
tgtattgcat 
tggagaagtg 
agacaaattg 
tgaagttgaa 
tctcattctg 
ttcagatgca 
ttaacattta 
atatttcaga 
gctaaaccct 
ctacaaatgc 
aactttttga 
atagagtttg 
aatctgttca 
tgcagcacaa 
tatcaaggta 
tcttatctaa 
aagcaagtac 
aactgacctt 
ctctgaactt 
gaatttatat 
atatatagat 
atgaattgta 
gattccagat 
gtctatatga 
aatgatagta 
atggccatta 
ggtgtcctga 



34140 
34200 
34260 
34320 
34380 
34440 
34500 
34560 
34620 
34680 
34740 
34800 
34860 
34920 
34980 
35040 
35100 
35160 
35220 
35280 
35340 
35400 
35460 
35520 
35580 
35640 
35700 
35760 
35820 
35880 
35940 
36000 
36060 
36120 
36180 
36240 
36300 
36360 
36420 
36480 
36540 
36600 
36660 
36720 
36780 
36840 
36900 
36960 
37020 
37080 
37140 
37200 
37260 
37320 
37380 
37440 
37500 
37560 
37620 
37680 
37740 
37800 
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agatgtttag aacccttgcc acaagagcag ggaaatatgg aatacaccaa acaattaatt 37860 

cttagttgtc tgctcaacat ctgccaaaaa ctatctccag atggtggcaa aatacccaaa 37920 

ggtaggtact ttttggaagg aaaggttaga aaaattatct actttgagcg tgagtgtacc 37980 

tgccacttag ggtttttgtc tttctcacca ttgtagatat tttagatgag gagaagttca 3 8040 

acgtggagtt gatagttcag tgcatccgcc tttcggagat gccgcagacc catcaccatg 38100 

cccttttact tttgggcact gttgctggaa tatttccggt aagcgttaat gatataagat 3 8160 

tttagcagat atttcagata tttttcttcc acagaaatgc actggcacct gttgatttta 38220 

aatttaatca attaggcttt ggaatcttcc agcactggtg agactctcag ccctgcctct 38280 

tcatagctgg gactgtataa gccagttgat cttgctaagc tttggtttcc ctttttatca 38340 

agtgggaata ataaccttct tcactggatt gttgggggaa ttcagttgga taatttgtgg 38400 

ttagtactta gcataatacc tggcacacta tctatctaat aaaacagtag gtattggtgt 38460 

taattgggaa aagttaatat ctcatttatc ctcaagtaat cctcaaagta acctttaacc 38520 

agtgacttca gaaaattaat ttgatcataa actatgtcat ttaaagaaat cacgtccctt 38580 

ttatgaaaaa tatcatttaa acatttgtaa agagattgaa gctattttca tattgaggat 38640 

ttgagtcaaa gctccttaac ctggggttct ccatgagctt taggggatct taatctactg 38700 

aaatcatatg caaagtttta tctgtatggg attttttttt ttcttgtaag agagtctgtg 38760 

cttctcaaag gggttgataa tccagaaaca gttaagaatc attaatattt tttctgtttt 38820 

tattgaaata tagttcagat gccataaaat tcacccattt aaggtataca attaagtgat 38880 

tttagtatat tcacgaagtt tctgtaatca ccaccacgat ctacttgaag agtattttca 38940 

tcaccccaga aaagaaaccc tacacccatt aactgtcact cttcctctcc ctcctccctg 39000 

ccgctgccac cgtcaccacc agcaccacca ccaccaccat gaccaccacc ctgctcctgg 39060 

ccactagtaa tccattttct gtctctctca atctgcctat gctggacatg tcatataaat 39120 

tgagtcatac aatatgtggc cctctgtgtc tggtttcttt tgcttagcat aatgtttttg 39180 

tttgtttgtt tgtttgtttt gagacggagt ctcgctctgt tgccgaggct ggagtgcagt 3 924 0 

ggcgcgatct cagctcactg caagctctgc ctcttgggtt cacgccattc tcctgcctcc 39300 

gcctcccgag tagctgggac tacaggcacc cgccaccacg cccggctaat ttttttgtat 39360 

ttttagtaga gacggggttt caccgtgtta gccaggatgg tctcgatctc ctgatgtcat 39420 

gatccgcccg cctcggcctc ccaaagtgct ggaactacag gcgtgagcca ccgtgcccgg 39480 

ctgcttagca taatgttttt gaggttcatt cttgttgaag cttgaatccg tattcctttt 39540 

ataactgaat aatattgttg tatgtgtatg ccacattttg tttatctgtt catcggttga 39600 

tggacatttg ggttgtttca actgtggggc tattataaat tatgttgcta tgaatattgt 39660 

agacatggtt ttgtgtggac atgttttcat atctcctggg tatatatacc taggagtgga 3 972 0 

attgttgggt cacacggaat tttatgttta gctttggagt aacttgcaag actattttcc 39780 

atagtggcca caccattttg cttccaccag cagcctgtaa agttcccaat ttcttcacat 39840 

ccttgccagt acttgttact gtctttttta tcgccattct agtgggtgtg aagtagtatt 39900 

tgtgcttata ttcttttcgt atgaattttt tctttgaaga aatgtctttt caaatccttt 39960 

gcctgttttt taatcaggtt atttctcttt ttattggtga gttgtaatga atcgttgaaa 40020 

tggaaaatgt tagcctgagc aggaatatca cttgtggcca ggagttcaag accagcctgg 40080 

gagttcaaga ccagcctggg taacatagtg agaccccacc cttacaaaac agaaaaggaa 40140 

aaaaaattag ctgggcaggg tagcatgcac ctgtagtcct agctactcag gaggctgagg 40200 

tggaaggatt gcttgagctt aggagtccaa ggtttcagtg aactgtgatc acacaactgc 4 0260 

actgcatccc agcctgaatg acggagcgag acgctatctc ttgaaaaaaa aattagtaga 4 032 0 

aattaacaga aataaaaaat gttgatagtt gtgttttctt tcaggataaa gttttacaca 40380 

atatcatgtc tatttttaca tttatgggag ccaatgtcat gcgcctagat gatacttaca 40440 

gttttcaagt tattaacaag acagtgaaaa tggttattcc cgcacttatt caggtaaggt 40500 

ctctttatac catcgtgggg tttctttttt taatttaaaa attttgcata taaatgacat 40560 

taaaagcctt ttttttagtt tttttttttg tttttttttt ttttaatatc ctcaacagtt 40620 

gtataaaagc ctttctgaaa tgctcaagta cattgccctt aagttttaga acattgaagc 40680 

taagaagcag tcttggcaat gtctttctca ataatataaa caaacgtgtt atagccggtt 40740 

ctgctgtgct gtgagtgtgt aactgtaagt gtcttaacat cattcgggtg gggtaaattg 40800 

acatccacca tctatttcag gacattgcat cagtttgata ataccttctt acatgttgat 40860 

gaggctagga acttttagag ctacagtgac aaatttgtta acatgcagtt tagcaaatgt 40920 

ttaccaagca ttctctaaga gctaaactca gagagagagg tgagatgttc tctctcccag 40980 

taaggaatga acaattcaaa agttaaaagt agcctcaaga gtaaaagagc ttgttcatct 41040 

acttgagttg tctatgtacc cttcacttgt taattaatgt cttttctcat gctattgacc 41100 

ctctgaggga tgaccatatt cagcaccttc ttttaacctt agtattagat tggtgttctc 41160 

agtgtccacc aagttggcaa gtataggata tatacctaaa aagaaaaata ataatatgca 41220 

ttcatatgat cagtgccagt cgaataagta aatgctaaag agttcagaat aggggaaagt 41280 

ctatatgggc caggatgacc agaaagttct ataagaaata gaatttgtac cggtgtgaaa 4134 0 

gaaggatggg atttgttcag tgatggcaga gccaatagta tgagaaaagg catgatctag 41400 

gaatgtgcac agcatgtttg aagaatgtaa tggtgattga ttgatggaga gcagagggtt 41460 

gagacaggta gacgttggtt tattcattca ttcaagtgtt ttctgtcatt ctttctgtca 41520 
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tttattcaac 
ctgaaagcag 
ggcacaaccc 
tatcagtgtc 
attgtttctt 
gcaatggcat 
cctcagcctc 
atttttagta 
ggtgatctgc 
cagccatgta 
acactttgta 
atttattgtt 
gtaacaccag 
gcctggtcaa 
cctgggatgt 
tccaggctgc 
ccctgtctct 
ccttatcttg 
tgcttgctgt 
agattgtggt 
ggcgcctgcc 
ttctcctcat 
gcgaaaaggt 
ttatgatatt 
ctttttgctc 
cccatagcac 
ttctgagaaa 
ctgaagagta 
agaggagcgt 
agccctgttt 
agtagatctc 
cacattatcc 
tttctacaca 
agtttagtgt 
tgccagagga 
taaagagaaa 
gtaagttaat 
aattccattt 
ttggctgtgg 
aggacctcat 
agaaaaaaag 
agccttgaag 
tgaagccagg 
ggatcacttt 
aaaaactttt 
cgaagcagaa 
tttctgcgtg 
atcctgcagt 
gtgtatgtgg 
aggatagtgg 
gaaatctaca 
tgattgtcca 
ggtactctct 
gtaatagatg 
aggttatatc 
ttattgctgt 
aagaaatgct 
aatttttgtc 
taatgtgttc 
caaccagtag 
aatagaacaa 
aacataatgt 



aactatattt 
caggaagttg 
aaatcctgtg 
acatatttgg 
ttgtttgttt 
gatcttgggt 
ctgagtagct 
gagatgggtt 
ccagcttggc 
tcctgtttca 
ttccccttgg 
aggcctttga 
cactttggga 
catagcaaaa 
cccagcttct 
agtgagatat 
aaaaaaataa 
ttcacctaaa 
tttgcagtct 
aaaaatcatt 
catccttgtt 
cttgcttttt 
cagaacctag 
gtccagttta 
agaacctttt 
tctgaatatc 
cagttctgct 
atgaaacagt 
ccgtgtccct 
ggataagtcg 
gacccactgt 
tgaaattatg 
ggatgctatt 
ccagcatcag 
aaaagaaggt 
cagaagagaa 
cattttaatt 
gccattcctc 
aatgttgctg 
tagctgtctc 
taggtaccac 
aaaatcccac 
tgtgatggtg 
agcgcagttc 
tttcattaag 
cgacaaggca 
gccagccatt 
cagctcaggt 
tatatgtggt 
tttttaacac 
cctgcagttg 
gtttccctgt 
tggcaaatac 
aacacatttc 
tgttaattct 
ttcagaaacc 
acaggttttt 
agtgtccttc 
tttaaatgtg 
aagtagcttg 
gaaagtacgt 
aaactctgtg 



tgatcaccaa 
ttggcctttt 
tgatttgtgc 
catctgccat 
gtttgagaca 
cactgcaacc 
gggactacag 
taaccatgtt 
ctcccaaagt 
tacagaaaat 
ggctaagtag 
ttattaaaac 
ggctgaggtg 
ccccatctct 
tcggaggctg 
gatcacatca 
ataaaattct 
cttgttttgt 
gatagtggag 
agtgtatttg 
caacttgttg 
gaacagtatg 
ggtgggaaga 
aacatgaggc 
tcaaaaaggt 
ctcagttgca 
gccacgcaag 
gtctcttcaa 
gaagtcagta 
aggaccattt 
aaataaagta 
ccttcacaat 
ttagaagcag 
atacaaagct 
aagcgtagat 
tgcaggttat 
tgattgtgta 
acacattgtc 
gtatccagga 
tcagcttctc 
aggagtggca 
gaactgcagc 
ttagcctgtg 
aagaccagcc 
aaagggttga 
cgttagaagg 
ggccctgtca 
ttggcaggga 
gatgtgggta 
tgcactgcca 
gaagttcaga 
tttgtttttt 
actcacgttt 
tctgggaaaa 
gtatgaagga 
attcccaaag 
aatgtagaga 
atgtctcagc 
tttataaaaa 
gtaaatggct 
ttcaccatga 
actcagtggt 



tacctggagt 
ctgtgctctg 
agccttccca 
tactgcctcc 
gagttttgct 
tctgcctcct 
gcacgcacca 
ggcctggctg 
gttgggatta 
aaatctcctt 
cgcttgggac 
tcttgtggct 
ggattatttg 
acaaaaattt 
gggcaggagg 
ctgcacttca 
tctggcaatt 
ggaagattac 
attctataga 
tggatgcgct 
atacactggg 
tcacaaaaac 
taaggaaaat 
tgatgatgcc 
ttttggcatt 
gcaaaacatc 
tggttctctt 
agcggttttg 
agacacaaac 
ttaaatgact 
gtatactctg 
tcctaagtac 
acactgaatt 
tgatgaatat 
cgggtgtcac 
ccttagtaca 
gtttcttgaa 
ttttatacct 
tgagttaggt 
attaaaatat 
gcaaaatatt 
accactgagg 
gtcccagcta 
tgagcaacat 
cgtgttgata 
aaggtactct 
tcactgtttg 
cagacgaggg 
aggaagtgtc 
accccattta 
gactccttcc 
gtcttaaaca 
ttcactaatt 
tgttatcccg 
ttcaggtgaa 
cagtgtcatt 
ctcacactag 
tcctgtcttc 
ggtattctgc 
catgaaaatg 
aaagccgttc 
ttcattctta 



aaaatggttc 
cttatcttca 
ggcagtctaa 
tgttatcttt 
cttgttgccc 
ggtttcaagt 
ccacgcccgg 
gtcttgaact 
caggcgtgag 
tggggcaggg 
atgacaggta 
gagtgtggtg 
aggccaggag 
ttaaaaagtg 
atctctttga 
gcctgggcca 
tcttggggcc 
ttgcacctac 
agtttcaaga 
gccacacgtc 
tgcagagaaa 
agtgctggcg 
ggtgatttct 
taatgtgtgt 
ttaaaaaatt 
ttctgaaggc 
ttaagcccag 
aaggcagttt 
acaggtgatt 
tcttagacat 
ttctgctaga 
agatttgtta 
ttggttttca 
cctccagtac 
ttactgttca 
tcttcagaag 
tccttatgga 
ctgtgctctg 
tcaaatgaca 
ctgaagacat 
aaaattgtac 
ctggcttagg 
cccaggagaa 
agtaagacgc 
gttcttagtt 
aagagctccg 
gcgaggcagt 
aggtccctgg 
ttagcagtaa 
cattagacac 
gtgatgtggt 
ctcgcttctc 
tagaagagga 
ttaattgtcc 
aatgagtaag 
taataagagt 
caagcaactg 
caataatttt 
tgtctccaag 
ggaggcacgc 
gtcatgatct 
agtgttgtgt 



aaactttatt 
ttcttcttga 
gcagacttgc 
tcacacatgt 
aggctagagt 
gattctcctg 
ctaattttgt 
cctgacctta 
ccactgcgcc 
ctcatatcgt 
cttactgaat 
gcatgcacct 
tttgagaaca 
gtggtggggt 
gctcaggaga 
cggagaaaga 
catctaaaag 
tcattgtctc 
aacgttgaag 
ccggagcaca 
ttcctctgga 
gctgcctatg 
cacacagatt 
tatttagaac 
gcattcagag 
gaattaactt 
cagagtatgg 
gtggggaaat 
gtttggataa 
gggatctggg 
aatcctaacg 
cctttttctt 
gtctgttgtg 
ttactaaagc 
atctgaaagc 
gatttttaaa 
aaatagtcgg 
cgctgacgtc 
gtaaatgatc 
ttacatgaaa 
catgcagcag 
aaagggttga 
tgagacagga 
catctctaaa 
ctctacaatt 
cagagactgt 
aacttgtccc 
acagtatgtg 
gtcattctat 
attgaataaa 
ttcttccatt 
agggccactg 
gaaactggat 
tttacttgaa 
gtcacttgtt 
gaatcacaag 
cggcatttta 
ctgaaaaagg 
gaactgttct 
ctttaaagat 
actgagatgg 
acccatcaga 



41580 
41640 
41700 
41760 
41820 
41880 
41940 
42000 
42060 
42120 
42180 
42240 
42300 
42360 
42420 
42480 
42540 
42600 
42660 
42720 
42780 
42840 
42900 
42960 
43020 
43080 
43140 
43200 
43260 
43320 
43380 
43440 
43500 
43560 
43620 
43680 
43740 
43800 
43860 
43920 
43980 
44040 
44100 
44160 
44220 
44280 
44340 
44400 
44460 
44520 
44580 
44640 
44700 
44760 
44820 
44880 
44940 
45000 
45060 
45120 
45180 
45240 
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agtttatgaa cctgtacacc aaaaaaacta caagaatctt tataatattc ataacattct «,u« 
tcatggtagc acaaaactgg taacaactcc agtgtctacc agcagagtag ataaattgcc 45360 
tgtaSttat tttatttttt gagacaggtc tcactccatg tcccaggctg aagtgcagta 45420 
gagcaatcat ggctcactgc aacctgaaac tcctgggctc aagcaatcct cttgcctcag 45480 
ccLcaagt agctgggact acaggcgtgt gccacctaca tctggctaat t 45540 
tttttataga gacagggtct cgctctatta cccaggctga tgtcaaattc ctggcctcaa 45600 
gcgatcctcc caactcgagt tcccatagtg ctgggattac aggcatgagc c c atg 45660 
agcctttctt tacctggata tagtgagttc tgcatgtatg atttgtatac Jtctctgttt 45720 
gtgtattaaa cttgtttaaa caatgagaga aaaggaagga aagaaaggaa tacgaccagg 45780 
aatacaaaaa tggtttgaca ttagagtact gtaaagtcca ttaggtccta taatgcgtta 45840 
acagatgaag aagggaaacc atatgatcat tttacatttc acagtgaaat aattcagcag 
actcaatgct tattcacaac ttaagagaga aaaatcctcg ctcacacctg taatcccagc 
actttgggag gccgaggcgg gcggatcatg aggtcaggag atcgagacca tcctggctaa 46020 
catggtctct actaaaaaat acaaaaaatt agccaggcgt ggtggcgggc gcctgtagtc 46080 
ccagctactc gggaggctga ggaaggagaa tggcgtgaac ccgggaggca tagcttgcag 46140 
tgagccaaga tcgagccaat gcattccacc aacctgggtg acagagcgag actctgtctg 46200 
aaaaaataaa aaaaataaaa aaaaggaaaa aactcctctc agcagtgcag gatagatggg 46260 
aattactttg gcctaataaa ggtgacaatc tgaagcttgt ggtttatatc atacc aa g 46320 
atgaaatatt aatggtgtgc ctgctacagt cagggacaag aatctcctct gtcattgttt 46380 
ctattcagga aaagaaatga agtgtaagaa tcaggaaaga agagaatcag caattagaac 46440 
taataatgag ttcagtaagc ttgcttaaga gagatcagtg tagacactat ctagagatca 46500 
gtgtaggttg tgatctgtat caagcaggct ggctgaacta attagtctca ccactgttat 46560 
?g?tat?cct caaattagaa aacagagtag aaaaaaagat ttcctgtata atacaaacat 46620 
aaacttacaa ggtaccaaga ataagcctga caaaaatatg tgcacaaata ttacagggga 46680 
aatgtgcaag atatgaaggg acacagaaga aaacttggaa gaaatagaaa gagagacact 46740 
gtgtttacgc atgaagaagc ctcactattt aaaggtggca gatctgccca gattggtcta 46800 
gaaactcagt gtgctttttc aattatatca ttttaagaaa tgatattgat acattgatac 
caaactgtaa ggggccaagg acagtggtat ggtgcaggag taatcagatg gatcaatgtg 
actgaacagt aggcccagaa acacatgcag gcataggtaa gaggtgcagc aggtgatttt 46980 
acaaagcagt tgggggagag aatggtccaa gtgctggccc tggcatgatt tgtttcccat 
ttggggaaaa gtctgatgtc acaccttgta gaaaagacaa ctttagtcag agtaaaaact 
tcattgtaaa aaacaacagt tcaaatgttt agaagaaaat caaggagaac atgattgtga 47160 
ctggaggata ggaaagaatt cctttaaaat tttgattcaa tagaaagcca caggctgaga 47220 
gaagattatt gctctgcata aatcaaaaag cctcagtttc cattagaaca actggcaaag 
aatatgaatt cactgaagag gaattacaga tggccagtta atgtacaatg tcatttagcc 
tccagtaagc aggaaatgtg attaaatgtg atacccatca aattggcaaa actgttgaag 
tcgttagtac cagggatggg gtacgtagtg gcagacagtg tgattgttcc aaccacttta 
gaaagcaatt ggcaggatcc catgaggttg aaagttccag taattcctgt gatctggtca 
ttcccttcct taggtatata tgctgtaaaa acttcccaag atgatcaaag ggacataaaa 
atqttcttct ctagcattat tagtaaaagg ggaaaaaatg gacataaact aaacatctaa 
caacagaggc aaatggatag ataagtggta tttatagagg aatatcatat gcagttaaaa 47700 
tgaaggaact agaacataag tcaccatgca tagatctcaa aaatatgttt ccagaaaaac 47760 
atgttgttcc gtgatatgta gtgtttgaaa catttggaag cagatttggg atgatggttt 47820 
tgtttgtggg gcttggggta gagggagaat aggaccaagg agcagtactt gggggattcc 47880 
agctgaatct gtaatgttta acttctcaac aaaggatatg tgagctcacc agatcagcat 
gtcatagaat tcggtagtgc taattgagtg ttgattgctg tatttttctg aaccattgta 
gtacttttta gatactacca gtagaatata attaaaataa attaagagaa atagagttac 48060 
tacagtataa tgaagatgct atgaacctga ttaaattgta atgtcagaaa acagttgctg 4 812 0 
tggaaggttt cagtgaatgg tccgagtttc attgaactga cttgaaatga cttttattat 
tgacttccmt accgttctaa tgacgtcttc tttaaaaatt aggtagttga gagtggtggt 
cctgagattt taaaaggcct tgaagagagg tatgtttcct tttctttttt tactttttat 
ttqgaaatga cttcaaactt atagaaaacc tggaaaaaca gtacagaaaa ttcatatact 
ctttacaacc aggttcacca tttgataata ctttctgcca catttgcctt agcgttctcc 48420 
ctatgtaggg ttttttctct gagccatttg aaagtaaatc atatacatta tgtctctaag 48480 
aataagagat gttcttataa aaccacagta cattgatcaa attcagaaaa tttaacattg 48540 
atccaatcca gtgatgtaat ctacagttca tactccagtc tccctaatta tcccaacact 48600 
gtcctttaca ccgcatgtgt ttctgcatca tgatccatgt caagtctgtg tattgtattg 
agttgcagta tctgtttagt gtctcaatct ggccctgatt tctatgggct catgaattct 
ttattccgtg gattctaatc catatgctgc cattatatat ttgggtgcag gaattttcct 
ttctctccca tgtcctcggg gtttgcctcc tggccttgct gtgtacctcc ttttcagatg 
catggcattc ccagggccac tgtcctgccc cttccactgg aaagtgcctg ccctgcagcc 
aggaatgtga gctgctgagt tccagtcctg tgggctgact tgtagcgcta tgcgtaggac 4 8960 



45300 



45900 
45960 



46860 
46920 



47040 
47100 



7220 
47280 
47340 
47400 
47460 
47520 
47580 
47640 



17880 
47940 
48000 



48120 
48180 
48240 
48300 
48360 



1600 
48660 
48720 
48780 
48840 
48900 
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49320 
49380 
49440 
49500 



tttcgttttt ggagctgatt ttgtctgcgc ccccttctcc cctcctctgc tgtgcggtgt 49020 

caacttctgt ctccactctt ccctctagct cccccctcag gctggcagca gtggggggat 49080 

ctccagtggc agctgtgttg gaatttggtg tgtgcttttc taattacaag "atttgaag 49140 

tttgtagtct ttttttttgt ctcctaataa tgtgaaagaa gtgggttata tgttcatttc 49200 

tattaatatt tctctttata taaatggtgt atggtgtgtg tgcatgtgta tgtgtgcgtg 49260 
tgtgtgcctg tgtgtgtgtg cacgcacaca cctgccatta acctagggcc ataactccct 
gaacacccca cttggacagc ccagtgagat ccactgagca tgagctagat atgagtgagt 
tccttaaacg ctttcagcat ctgttgctta ttcagcacct cacagagtct gagcgaggtc 
gatgcttctg cacagaagcg cctagttact tcttgatgat ctctgtctat ggcaataatc 

cctcatttcc tttctttccg agtagttttt gaaataattt agtacagaat tatttaactc 49560 

gtctttctac ttgaaacatt tagcataagg tagacatgaa caagttctct gagtkycttc 4 962 0 

agctgccgtt tgctgagcga gcctctgtgt ttcaggttgc tggagaccgt tctcggctat 49680 

atcagtgcag ttgcacagtc catggaaagg aacgcagaca aactcaccgt gaagttctgg 49740 

cgcgcgctcc ttagtaaagc ttacgacctg ttagataagg taggtgttca tttctccctg 49800 

gaattactgt tttgttgctg ttttttaaat tattgtgaaa ttatgtaaag tttttaagaa 49860 

taaacttagc aattcctttt atgaatataa tgaagctcag atttttctta gaaaacaatc 49920 

tcttgatgac tagaaataga tttgtctgga aaaaattgtt tccttaccag ttatgagaac 49980 

atgtacctcg tatattttaa taattaagtc atataatata tactgttcta ctttattcaa 50040 

gtggaataaa ataaaagtat gaagtatatt acatacatat cactggcctc gaagttagga 50100 

ttgctgattc aaatctagct cagtcattta gctgcatgcc tttggcactt cctcctgaca 50160 

tcaggctttt gttttgttcg ataatgtgtg gttggtaatg cctacctgaa aggttatttt 50220 

aagaattaaa taagaaaatg tgtgtggaag tgcctggtgc ctggaggtgt gcagtggaca 50280 

gcagtctgtc ttaattgaag ttgrtttttt ttttaatttg tttttctttt catgttaaat 50340 

50460 
50520 
50580 



50700 
50760 
50820 



50940 
51000 
51060 
51120 
51180 



ataaaatttc tctttaggtc aatgccttgc tgcccacaga gacattcatt cctgtgatca 
gagggctggt gggcaatccc ctgccatctg ttcgccgcaa agcgctggac cttttgaata 
acaagctgca gcaaaatata tcctggaaga agacaatagt gagtgaagac ccaggacaca 
cccatttact gtactttgct ttatcaaaga gctgcattta actcccttca atttctctag 

tggtacaatt tcaaattgaa gtatattatt atctgaatat aataccatga catgttttaa 50640 

aatgtgacgt taattctaac gggaatatat gtgtgttgta agagctctta gaatggaatt "^nn 
tcagcctagt ggaattcttg cttactacag agcttacgtg gccttagcct tagcatagtt 
ctaagtccct tttactccag atcagtgggt gctttgtata agtgaaacac tggctagaac 

atgcacatcg cttagcatga ctgtgtgact caggaccccc gagtctgatt tccatttaga 50880 

caatctcaag ctgtgacaat gggaaaatgg acaaacattt tttgcagctt taatgatttg ^^n 
gggattctgg ctgtcccctt caggttaccc gtttcctaaa actggttcca gaccttttgg 
ccattgtgca gcgtaagaaa aaggaagggg aagaagaaca agcaatcaac agacagacag 
cgttgtatac cttaaagctt ttatgcaaga attttggtgc agaaaatcca gatccttttg 
tcccagtgct garcactgct gtgaaactga ttgctccaga gagaaaggag gagaagaatg 

tcytgggaag cgcgctgctg tgcatagcag aggtgacctc caccctggag gcgctggcca 51240 

tcccccagct tcccaggtat gcggccggag acttggaaca ggagctgtta ccgcctggca 513 00 

cacattgaaa aataacactt tggtgacttt tttttttttt cctctgagta gacgttgcat 51360 

aaaattggaa tttgttaaag aattgatctt gcagggtgtg gtggctcacg gctgtgatcc 51420 

cagcactttg ggaggccaag gcagggagat catttgagcc caggagtttg aggctgcagt 514 80 

gagctatgac tgcaccactg cactgtagcc tgggcaacag agtgagaccc tgtctcttaa 51540 

agaaataata ataaaaataa ttaatcttca aacagaaaga tgcagccttc ccgtttatct 51600 

tcagatgcac tgctttgcag tgttaatcgt cctccattct tccttccttt tgtctagtat 51660 

agtcaaaaag cagtgggtga aggttcttgg cagaggggcc cacccagtta ggcaaggcca 51720 

aagctgcctt ctgctgccaa atttggaagt tagaagctcc tggctgtggt agttccttac 51780 

tgtacagcca gtatgttcct taacctgtta gagctttgct gtctataaaa cagagacagt 5184 0 

atctacctca taggatttct gtagggcagt gcttgcaagg agttttgcat agtgcctgac 51900 

acatagtaag tgttcagtta atgaaagaca ttgttatttt ttaaattact gttgtgtcca 51960 

tactgttagc actgggagat ttcttgtgct tagccacata aaatgtatag acactgcttt 52 02 0 

ttaggctatg aggcagatag cgatgtaata ggcacagttt gctggttttt ttcataatgc 52080 

cctctactcc attccctagc ctctgtttct gcttcaagtt cttctaaagc tcttactttg 52140 

tttctagcct gatgccatcg ttgctgacaa caatgaagaa caccagcgag ctggtctcca 52200 

gcgaggtcta cctgctcagt gccttggctg ctctgcagaa ggttgtggag actctcccgc 52260 

acttcatcag cccctatctg gaaggcattc tctcccaggt gagccacgat agccacgaca 52320 

tgctacgcag ggtggctggg gaaggtaaaa gatcacaaat gtagtgattt tgttttttct 52380 

ttaagggtta gtttccgttt gacagaaagg agtcttattt ctgaagagat gtgcttgctt 52440 

ttattaaatg tgtacttttt ctcttccatc atccttgcca cttaatcttg aaattgatat 52500 

gggggtcact ttttagcgag atgcttctct tctttaattg tgcggaatgt attctggctt 52560 

ttttttcttc tgtgagatct ttggagttgg gctggtagag ccgcctttta ttacttgtgg 52620 

aaattgatgg gccctgtttc tactgtgaga ggtttaatcc taaggcatct tgcaaagcag 52680 
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52740 
52800 



aatttttgta ttgcacgtca gcggtttcct gtcaacacgc taggggccag tggtatttat 
gattgaagat gctcaagagt gttgagacct ggctgttcca ttgtgttgat catttgccat 
ttacaaggtt aggagccttc attaccatgg ggcaaatgat taaaccgctt cacttacacc 52860 
acagccaggg tttaaaagta gccttctcag agaaaggggc ctgtgctctg aggagctttc 
tcagtttttt agcaacttga ctcttgccgt ggtcccttgc attcctgtcc tctaagaaaa 
tcatctgtct gggtgctgtg tgtgcgtctg cgtctaccat cggccttcgt ccctgatcgc 
aagttgggca agaggtgtct gcccagtttt atccactgac gtttagggct gactttaaaa 
ggaactctgt gtgtagttaa aaagtgctaa gcattttcca aacactagta cctcatcatg 
aagcctctct gtaagtgtgt gtttgtcagc gaagatctgc agtggccctg atcacttttt 
ttatattcct gtaggtgatt catctggaga aaatcactag tgaaatgggt tctgcgtcac 
aggctaatat ccgtctcaca tctcttaaaa agacactggc taccacactt gcaccccgag 53340 
tcctgttgcc cgccatcaaa aaaacttaca agcagattga gaagaactgg aaggttagat 53400 
cttattgggt attaagactt tggagaaggg agaggtggta ctgcaatgtg taaactttga 
cttaaagaag aagatttagt ttttaaaaat acacaccgct gaactctagg aatcatttga 
aaatggcaca tctytttcct gtgttattgg tagaatcaca tgggtccgtt tatgagcatc 
ttgcaagagc atattggggy gatgaagaag gaagagctca cctcccatca gtctcagcta 53 640 
accgcctttt tcctggargc cctggacttc cgagcccagc actctgaggt aagctcaggt "7 no 
tcactcctcc aaatcctaaa gctgttatcc acctattgaa gagtccgaga acaatctagc 
cgtgtaacag tggtacttgc caaaataatt tacagtcagg tatcagcact ctaataaatg 
ggctcttctg ttttcagaac gatctggagg aagttggaaa aacggaaaat tgtatcattg 
actgtctagt agccatggtt gtcaaacttt ccgaggtcac attcaggccc ctgttcttca 53940 
aggtgacgct ttcttcctct tcctctcttt tgtgctcaga tttaagtgac aagatgtcga 54000 
agaaactaat actgtcttat cctttcagct gtttgattgg gctaaaacag aagatgcccc 
aaaggacagg ttgttgacat tttacaactt ggcagattgc attgctgaaa agctgaaagg 
gctttttact ctgtttgccg gccacttagt gaagcctttt gctgacacct tgraccaggt 
gaacatctcc aaaacaggtg ggtggtcgct tcagttcagc tgtcaccctt gycctgtgtg 5424 0 
ccttatacaa ggtaccagta agacacaasc tagtgcttgg aagcaagatt ttagaggttc 543 00 
ttaagtccta agctaaggag ggtgaatttt ttacttcatt gttgtcagac agtgaagtgt 543 6 0 
tgttttagca tsatgagcga atgttataag tcacctcact tgggagaaag aataaagtaa 5442 0 
atggccgtta ctgtcagtta ctggaagtaa ctttaaaacg caaaaattaa gattagttta 54480 
ttctctaatg ctaatacgtg acccacttga cagttgaggg aatttaactt gccagagtga 54540 
gtgttagaat cttaactcta aaagaaggaa tcattcttcg agagaccatc cattgtgttc 54600 
tgccttcgtc ctgattatga tttctagatc attttagact cttctgattt agttatttta 54660 
atagtgattt caatgtaaac attggagatt aaatggttaa gttcagtgtt ttttcatttt 54720 
tactcttccc acgtctacag atgaagcatt ttttgactct gaaaatgacc ctgaaaagtg 
ctgcttgctg ttgcagttta ttttgaactg tttatacaaa atcttccttt ttgataccca 
gcattttata agtaaagaga gagcagragc cttgatgatg cctctggtgg atcaggtaac 
caaacagaat catctttcgt tgctgaggaa gccatcactg ttcgagcttg gcaggtttta 
gatctgctca aatgacagct tgctaggatt tttctttgta aggttgtttt ttctggtctt 
tttaatgaaa tgttttattt gaaaattgtt ttttcaaaga tataacttca ttgtggaatt 
tgagttttgt tttgttgttt tgagacaggg tcttgctctg tcacccaggc tggagtgcag 
tggtgcaatc tcggctcgct gcaacctcca cctcccgggt tcaagcaatt cttgtgcctc 55200 
agcctcccga atagctggga tcacaggcgc ccgcctccac acctggctaa tttttgtatt 55260 
tttagtaggg atgaggtttt gccatgttgg ccaggctggt ctcgaactcc tgacctcaag 55320 
tgatccaccc accttggcct cccaaactgc tgggattaca ggcgtgagcc accatgcctg 55380 
gccctgagtt attttttatt ttttattttt taaactgctc aaaagacaga catgatataa 
gctgtcatgg ttaatgtatt catttgtttt aactgggcct cactgagaac gtgtcagagc 
atggctgtcc tctggagcaa agagctcctc tgagatgttc tgctggagag aaaggggcga 
tgataaaggg tatgagattc catttgagtt ctagaaggca atacatttgc cctagagagt 
attagaataa tgcctggttt ttgaacctgc taatatcatg aaagatagta aaatatttgt 
atgttggaya ttttttcaaa cagaagaaag aaatgagact tgttgttcaa agtctttgtc 
ttactcttcc tagctggaaa acaggcttgg gggagaagag aaattccagg aacgggtgac 
aaagcacctg ataccatgca tcgcacagtt ttcrgtggcc atggcggatg actctctttg 
gaaaccactg aactaccaga ttctgctaaa gacgagagac tcctcgccta aggttaggaa 
ctgcgtgaca agtactgaaa ctgcatcttt tcactgcata gcctaaacag cctgcaggtg 55980 
gtcacagatg taggccccct agtaaactgg cagggaagaa cattgctatt tgtcatccag 56040 
tactagaggt ctggtctttt gtattaggca gaaatttgtg cctaagaaaa ttttgttttt 56100 
tacttatagg ctatttgtaa agaatagcaa tgccctatcc ccaccccaat ggtgcgttgg 56160 
gcagaagtac taaaatactt tctaaatact tttaaataac aaagatttga agtaattcct 56220 
tcttttttaa gtgctattga aatgtcattg tgctccagag agaaatgcat tgaatttatc 562 80 
cacactatag tcttttgagt tagagcaata cattctgttg tttatagata ctaggcttat 56340 
gatcttgttg agtaagtggt ggcagtggaa ataaaacatg ggtaggttga gtggaagtca 56400 



52920 
52980 
53040 
53100 
53160 
53220 
53280 



53400 
53460 
53520 
53580 



53700 
53760 
53820 
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54060 
54120 
54180 



54780 
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aaatttgaca cccaagtgtt acaaatttca aactgtttta gcaatgagtc accatttctc 56460 
aaactaatga ttgtattaga actttactgg ctataacaga ccaaagcaga attttattaa 56520 
taggatttat tagttcaaat tctttattag gatatgctaa caatgagact attatgatga 
acccccacat tttaaaccat aggtttttat acaagtagca gtatcaaatt tattgtaata 
ttttactttg cacaaatcaa ggaaatcctg attgaatcta ggtagttaca aatggttttt 
taataattca ctttgtactg tccagatatt ctttaattaa aatatccaaa attagaagac 
cagtagggct tttgtttcaa aagctaacaa tacccaacta ctgcttgctt tccttattat 
agacaaggca tgtgaggaac acctagacgt ttttaaagcc ccaaaactag gatattgtga 
cacagaatgt tttttattca gtgctgagat gggagtttat cccaagcgat tgtcctggcc 
cctcatacaa ctatagtggt gaccagtcag gggctgtggg tgaggagacc tgccagaggt 
ggctggaggc actttattgc cagcctggcg ggctggcgag cgccaatgaa gtaaaatctg 
gtgacacact tgagtttgca tgtggcattt tattttattg gtttttttta aaggaagaga 
ttataaaaag acatttcaca ttaaagattt gcagtcctgg gacacagttt ggaaaacact 
atttataagg ttgcacatat tacaaacagc tcccaaatgg tgaaactggt attctaagat 5724 0 
gaaagcttaa tgaacataat gaagtgaata aacgcgtgtg aactaatgtt taaaaagtta 57300 
gagcttgtct caagtcagta cagctcttaa gataataaat acagtaacac tactttttat 57360 
ttctttgctc ttttatccct ttcaggttcg atttgctgct ttgattactg tgttagcact 57420 
ggctgaaaaa ctaaaggaga attatattgt cttgctacca gaatccattc ctttcttagc 574 80 
agagttgatg gaaggtaatt cccaaactat tcccaaccat taaacaatta aggaataatt 57540 
cagtaaaatt ctgtgaaaat gtgcccaatg ataatgcaga tttggatata acgtgactag 57600 
cagttgatcc agtcctcata caggcagctc cagtctcaaa tacaggcagg ctcgagctcc 57660 
tgttttgagt tgggtggagg gagcaagttg cgaagtgacc acctcctgat catttgagag 57720 
tccgtttaca gtgtaatctt cacgatcttt tcttttgtgg tggtttgtga tcatacaagc 577 80 
agattacagt agtgttttta attaacctca tacatttata tgattgaagt tttggtcccc 57840 
agattgtatg gaaatgccta gtggcattaa ggatgcggta ggatgtccac ttttagtagc 57900 
aaccgatgtt cattcactac tccatgttag gtgctttact tggattatct cacttaaaaa 57960 
ccacaacatt ttatctctgt tttacaaagg aagaaactag aggcttaaaa gatttcagtt 5 802 0 
atttgacaaa gatcacaagc tagtgggtgt gacatgggga gctgtgacag ttctggaaca 58080 
taagtcttag gcccaggaaa taacagtaaa tgttattatt aagggagggg tggtggagca 5 814 0 
agtagatcag tcctttactg attcattgct tatctaagct acaaaagtac attctccttt 58200 
gtttcttagc tcttggaggg gggaggtgtg agctactaaa ggggtggcat ccctaggaag 58260 
tttgagtttt ggggattctt attcagcttc cagtgcaagc ctgtgggcaa ggaatgaagg 58320 
cggaaggagc ggtgtggagg gaggcggtcc gtggcgcctc ctgctttgtt aatgtgcttc 
atttcactct tttgattgaa tgattgctgg aaagtgcaag gcattaagaa ttaaactaat 
gagaaccgag gcaggcagac tgactcagat tttaattttg attttttttt tttttttttt 
tagatgaatg tgaagaagta gaacatcagt gccaaaagac tattcagcaa ctggaaactg 
tcctgggaga gccactccag agctatttct aagactttct gtggtgtttc atactctact 
cagagttcac actcatattt catattttta tttttgggtg ttgggtgcca tgttactttt 
ggtgccttaa tacacctact tggattactt acaaatgttt tatcacttcg ttacaaaatc 
cccacctggc ttgtgctgcc acataagcct ctcctgccta tcgtatagag ctgcagaaag 
agtaaatgat acacggtatt tttatacaga ctgctgtgtt tgtttaaaca tttattattc 
tcttcctgat tgatggtaat aatattagac ttgttaattt tagcacccaa agctgacgcc 
tcatttgcac tgtaagcctt aactcttctg tacagcagta tcttatatac atggtatcca 
tgttgcagat ttcactcaaa gttgctctat ttcaagaaaa tgaagttatt tagcaatcaa 59040 
cagaagtact tttgactgta aagcctactt ttcattttgg gtaggcgaac ttcagccttc 59100 
gtttctttgt tgtgcccata aagagaagtg gttctggaat gcttttttta acccaggagt 59160 
gtgactgtca cctttatcct ttgttctttt gggaaacggg agagatgaag gcaacacgct 59220 
gcttctaaaa cagctcatac ctggctgctc acacagaggg cccagaaaca ctgggtggca t;c " Bn 
cgaggaagct cctccaggat tcagaatgaa cccagttcca ttggtggtta actaagaact 
acttgtctaa gaaagtaagt atcagtagat ttttttcaat gctttgaagt gccctaatct 
ctagtactgg gtcatggtga agttggaaag tgagggttca aataaaatta gatctgcccc 59460 
ctttttaaag gcatctaaga acatcccgta gaatgttcgc attgagttta aaagcctttg 59520 
aagctaatat agaagtttaa tgcagaaatg tggcaagcaa aaggccaact tgcattttag 59580 
agacagagct tgctagtata gaagtgcata tttctaagaa atgtctcagt aaaattcatt 59640 
aataaatact aatagctttt taaatttttt tcctccacgt aaaatgaaaa tgaacttaag "™'"> 
acttacaagg aatgatgtga gtgggctatt tttttcctaa atgcctcaca gtctccaaca 
gctcagttaa gcactctgat ggtctttatg gagaaaaata actctggggg attctcgagt 
cttcttgcct tctacagctt tccctgtggc accctccacc cccaaccagt gtctttgagc 59880 
ttggtgagac tctgcataca ttaatatgaa aagctggaaa agaattaagg gctagcratt 59940 
tctacactga gtacttttaa aaaaataaga ttgacaatgg tattcttttc aatatatgga 60000 
tagaaataat atctaaaatc tgttctgaaa atattcctaa atcctggcag agttctgccc 60060 
cagtctaatg tgataaaatg atcctcactg actgagcttt tttccctccc cactcccaaa 60120 
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56700 
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60240 
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cctttgaaat ttcctgtaac aagatctgac ctgtaaagat cccatataca tggaacttct 
gccaactccc ctcaaataat agaaaaatca cagaaatgga accatttaac ttgctgatac 
aggtttgatt ttaatcacgt aaagtggaag agtgctttga tctcagttgc taatgatgat 
ggactaatct ttagaagaac ggaggccttg attggtgatt tcatgggaac aaaacagatt 
tctgcattac ctctgcctgc tttgtgggtc ccttagttgg cgtacacttt ctgatagtgc 
ctgccttatg aatgcctgaa gtctaatcct gaccttactt tgttgcccat aaatatcttt 60480 
caaattgaat gcatcttgcc tcttcctgaa actttctgta ggcgtctgtg tagactctga 
tttgccctaa gttcaaatac acagaaatca gctgccttgg gcttggcatc aagttaattt 
cagcaccatg tatttgtttt gctttggttt tttgccttaa tttctgtgct cagatgtcat 
gccaattact tggtaaagct tgcagctaat agagccacat gtttattacg ataaaatgat 
gtcagtcaac atcagtcctc agtgatggga tacggtagag cactggttat gacagtgatc 
actggggaag gagttgtgtc ttctgtaaat tagtgtcttc gggtgattta atagctgctt 
aactaatccc agaggacccc ttaggagttg atgtaatgcc tgtagaaaaa gcagtgcgca 
gcaatagagc acagaagagt cgaagctcaa gagaacaggg aacatactgg tttgtttcac 60960 
caagtatata tcctagctcy tcacagtttt accaaaaaaa atctgtgctc aatacatgtt 
agctactgta ttatagtagg tactccataa atacttgttg aatggcattc ttgaagagtc 
agccaaagta ggagagaggg cagagtggct ttaactggct ataagtgttg ctgcccccag 
tttccccatt cagtgctaga caccaccata cttcatggca aggactgaga acacccagca 
ggatctaatg gtgcacaagc tcattttaac aatataaaca atagaatgac agtgagatgc 
gtttcagcaa ggccagtatc acagaagcca ttctgtattt tggtttttgt agcagctgtg 61320 
taggtaggct accagctcct tacttccagt aagtggatgt ctccattaat ttccagcgtg 613 80 
tcaatactgc tgagctcttt aaatctgtgt ttgtactcca ggctgtgtac gccatttact 61440 
gcaaccttga attctctaac atcacagtaa attatcatct gaaaggaaag gaaacatgcc 61500 
attagtggtt ccccttagtt taaaattttg tgtttgttac atgctaacgg tgtcagagta 61560 
aaccatttga ttatacatag aaaagttcca gcctgctgct gctgacttcc ttctgtgact 61620 
tccaagaccc taatctaact aaagtgtccc tggtggtggc actgtctgcc tcatcccacc 61680 
cagggaagtg actcggacat gtttggagcc tcacaccatc cccagggcct gcggccctcc 61740 
tttcctcaac acagcccctc tccctcggcc gcatcctgcc ccttgctcct ggaaagccag 61800 
agtgctgccc cctcctcgcc gatgctgggg actctcctaa agcaggaaaa ctcctaatat 61860 
ggttgcttta gttcaccgag atgctgtttc cagctgtagc ctggaaccaa gggacttgga 6192 0 
tcctgacttt tctgcctcaa actctaggac catgaacaaa ccagacaatt aatatgaaat 6198 0 
ttcagcagct taaggcactt gctagggttt ttacagctgc ttggtgatcc tgatcctgat 
aagctcagca catcttggcc tcagtgtgtt tgtccaaagc cgtcaaagct gggtatcccg 
taagttggaa gtggtgcaca tactacaaat tcattcagat ccatcggcag tgttcgccct 
tcacagtcat ctatgcccaa aactcaaaca atgaaaaacc aaaaaaccta ctagcctttt 62220 
acatcacctt tacacctaca cttcatactt ctggtgccac gctgccagtg ccaggctggg 62280 
tttccaccca ctctgagaag gccactggga ggccctgcac tgagctctcc ctgtgctcct 
cgcccccatg aaggcttcag aagaaagagg ggcagctcag ccttgctgct aacccggccc 
caccccgaga ggccagaaag aagcacaaaa cctgaatctg caaagcagca agttgagggt 
tttcattctt ttgattgata ggagcagtgg aaaggactgt caaggaaaat caggaagggg 
aggaaatttc agcccaagtc ctgcactctg ataacagagc tggccaagat gaagcttctg 
tggctacaga ggattgggga aataggtgta acctgagatt ctatccccac aagaatggtg 
ggaatcccag ttccaaccag gtagaaggga ccccagatat atttaacaaa atctcaagaa 
aacacaaaac acacagcctg atgctatagc tcagaaggaa aaccaagttg gcagcagagg 62 7 60 
gagtcaatta gttacctgta ctgtatcaga gcaagtgcaa agccagtgaa ttacaagcaa 62 820 
crtqqcaaaga aatgacatgc ttccgggtga gcagactact gcaaatgaaa agaaggcaca 62880 
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tcctttgcct agaagaaaac agaacaacag aatccctcac aaccatctga tgctgccgaa 
cagctgggag gaacacggat ttcatccaga ggaagagccc acagatgaaa ttacaacact 
gaatgagatt gtttttaaaa cactgtagat acaactaaag tgtatttgaa agagcaagaa 

gatttaaaat gaaagtaatg gcataatgga aagacctgag ataatctcaa tgcaagcata 63120 

tttaaaaggc aagagatgga agcccccgga aaggacctat cagtatagaa aatcgtcagc 63180 

agtgatcaac acaaggataa ttactgctcc tgaattagaa aattaaatgg aaaagacgat 63240 

tgttcaaaag catataacat tcatctattc attccttcag ccagccatcc ctttcactgt 63300 

ctctgcaaaa aaacgcctgg aatgtggttc acctaatcct aatggtatgg tttgaattac 63360 

gaagttggag gtattttcat cttagtattt ttcagcatta tttgaatgta tcattattat 63420 

ccttttttat attaatatag cagatccaaa cttggggccc agataggtca aaaacaagtt 63480 

acatcattgg tttcctgtca cccaccacaa ggctcaactt ccttctagag cactcatctc 63540 

tgtttccctc cagtgaaccc caagtcaccc tttgcacctt gactcttgcc aagccaacaa 63600 

ccctgtgctg cccttcccac ctctaaggac ttgttcaaat catgccctaa atctagaaaa 63660 

gccaaccttc agccttccac atccttccca gctttctgaa ttcgaaacca atatccacaa 63720 

cttccaggga accttccatg attagcccac atctgtctac actccacata ccagggcact 63780 
gcattagact ggaacttagg gataaatacg aatgtcacat tgctatctat ttcgcctgat 
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attcatgctt ttagtaacgg gttttgatga gtcctcaaga acccacatgg tccagacaat 63900 

gtgtcagttt aaaatgctgt agaatgataa cctaaaccaa tgaccctaag tgactgtgtg 63960 

aggcagtggc acctgagctg gtgattgagg ctggccaaac acccggcatc cacatcacaa 64020 

ggagactatt ctctgctcag catcccagaa acatatttcg ggaaaggtat gtgttacaat 64080 

attcacctat ctggggactc aggactgcat gccttttagt tccaagggac tgatgtattt 64140 

ttccttgaat acatataaac aagactaact ccttaggaga aaggactgcc acctaccctg 64200 

atgggagatg atggccagtg caagcaagca gttttagggt ctccagtcct caagtctgct 64260 

tatatgatta aatgtgtgac tccagtcctg agacaagctt cctggttaaa ttcctatcaa 64320 

tatggggcta cacaggaatt ttaatatgat gacaaatgtc aattccaatg tgcataattt 64380 

gacactctac tggctttgcc atttatactg attgcccacg ttataaaaag gaaggggcaa 64440 

attacaatct ccaagtaaac aaaaaacata cacaacctcc acatactgag ttaggacttg 64500 

atttaaatgg accacttatc ccacccctgc tcccaggatt cttattgctg tcccattttc 64560 

aaaaactgga acctcacctc aaagtacatc ccaggactaa atgggaaaga ggtaatattt 64620 

ctctcttctt ctccccagga ctcctgaaga aaagaatttc ttacaaatgc tttaatattc 64680 

aggcgtgggt tcaagtgtag agcaatatcc tttgattttc ctgctagtag gtcaacatta 64740 

aagcttttga gaaagaaaga aaaaaaaaac cttaattaac caagtaggac agagtagtgg 64800 

caatttcagc aatgttagtg accagacttg aactatgatt ctaactctat gaaacacacg 64860 

tatccacatg gacatgggaa tgcaggtgaa gttgaaattt taaacccagt cagcattcca 64 92 0 

ctaattcgtt acttgcaaaa taaaaatacg aaaaaaaaaa cccctttaaa atcagcattg 64980 

aggctgagat gtggttggac agaaatggaa ggtttaacgc taaactgaat gactacttca 65040 

ggctgcagga gcctgggatc ctctcaagca ggtattctct tttcatcatc tacccgtgtt 65100 

ccttccttct tgaaagtatg ccactttcta ggaaatagct ctaatttcaa gctttcagaa 65160 

ggctggacgt ctgtggaaaa agttgttaaa gcagaaattt tctaaatcac atgacaagag 65220 

gggccctaaa aaatgttgcc tgggctgggc gtggcgactc acacctgtaa tcccagcact 65280 

gggagggcga agtgggcgga tcacttgagg tcaggagttc cagaccagcc tggccaacat 65340 

ggtgaacccc tgtctctact aaaaatacaa aaattcaaaa agtagctggg cgtggtggtg 65400 

cgcctgtaat cccagctact ccagaggctg aggtaggaga atggcttgaa cacccaggaa 65460 

gcggaggttg tggtgtgctg ggattgtgac actgcactcc agcctgggtg acacagtgag 65520 

gctccatctc aaaaaaaaaa agaaaagatg ttgcctgatc ttacggggct gctctggagt 65580 

tctacaaggc aaggataggc ctcgtggaca aaaatgctac tctcttgtct ggccagaaaa 65640 

agccgaggtc agaacacggc cttagtcggc tctgctgccc actggctgtt cacaatctaa 65700 

ccacctcctg cttgaatcat ctcactttgg aaatgaagat tctgacacta gccctacttc 65760 

cagctcacag gaccgttcac gggtagaagg cggtaacagg cacggaagta tctgcactgt 65820 

gactggtacc gaaggatact gaccttttgg catttgcatt cacttctcct ttaacgacga 65880 

cagttcgtcc agggcccatg ggggtgttca accttgcagc gaatggcagg ctctgaaaag 65940 

aagccacagt caggaccaag aggcctgcag aaagcgatct ccagggaaaa ctcgtgtcca 66000 

aagaacacta cagaatgttt cagagccatg accctgtaaa tcccaggagg ggagagacag 66060 

tctgccttgg tcccagctgg gtcccctgca gcagcactgt ccctggctcg aagcaaacac 66120 

tcatgattac attttagacg atgaaatgag tgacagggtg ggacccgatg ccctctgtat 66180 

gagtgaaaca cgccaggaag gccccctgtg cctggggtct gaggcagtac tgtgttctgc 66240 

ctgaagggga agcagccaag cagggaggtg ctgaggaaat acacaggaat ggctcagagg 66300 

caggcctggt tgactctcga atccatccag ggacggcaca caaatgcaga gggggctgct 663 60 

ttgggcttct attgtggata caggttactc gtaacagctc attacaactt aatttttata 66420 

cagagttaag aaaatttggg gctcttcaaa cctttgacac atagttcata ggtggtattt 66480 

tggtgcaagt cwaagtgtga ttgacagtcg aatmtttgct cttggtgtag acagttctgg 6654 0 

gtgcgatttt agaaatgtct cctcctctat tactaggctg tagggaaaca gttctacagt 66600 

aaggaatgga atgaatgaag ctgccctcca cggtttaaac tgttcatttt ctatgcaact 66660 

ttataaaata ttccacatga aataacccag gcaaaaatac tcacaagctg gggcgtgcca 66720 

gactttggaa cctattggaa aagaaacaaa acacaacaat gttagaaggg gaagaattat 6678 0 

agtttataat ctgaagtctt ggttgtgctg agctgagcct ggccggagcc tgggatgttc 66840 

ctgctccact ctggtgtgac ctccaggcag ctggtgcttt atgacggaat ggtatggtgt 66900 

tgtgaagggc tacacgggtt gagaagagag tctgatatcc ctgttcatct gagtccttta 66960 

tcctccacca taaattaata atttttcata gaactatatg aaattttttt aagagacagg 67020 

gtcttactct gtcatccaca ctgggatgca gtggctcaat catggctcac tgcagcctca 67080 

acctcctggg ctcaagtcat ccgcctactt cagccaccca aagctgggat cacagccatg 67140 

cgccaccatg cccagataat tttttaattt tttgtagaga tggggttctc cctatgtggc 67200 

ccaggctgat ctcaaactcc taggctcaag caatcccccc acctcagcca aagcgctggg 67260 

actacaggcc tgagccacca ccttcaccca gcactacatg aacttcaaac gtgtcttagt 67320 

tgtcctttcc agggtgcccc cagaatgcta agattctatt ctgtctgtga ggtgtgaacg 673 80 

tgccagtcgg taagacctca ctttctccat taataacagc gcatttttaa attgcagtta 67440 

ttctcccagg ggttgaagat acacatggaa gttcatttgc cagtgcactg gctgatgttc 67500 

aatatttgga aataccccag aacagtacta ttcagcagtt aagagtgaga gagtgctttt 67560 
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tacataagaa cagtgagtgg cagttacaga aactgacagg tgaatcatca gcaaaaactt 67620 
actqcgtgcc tggcaccaca agtatgtatc tcaattttca agtgattact tgcgttctaa 67680 
aagcataatt ttcaaaaacc cagccttctt gccatttcat taacatggca gctaaaacaa 
catttttaaa ccagtggctc attaaaagat ttaatattta cattttctct acttatctct 
gtcagttcca gactagatgc ttgggtactt tgtaagtcct gagaaaagat aataaagatt 
aatttgtaac atccagtaga aaatattttc tatatataac ctgtttgaca tataaaaaaa 
ttataaatgg tactaaaaat atgcttgaaa actatgatcc aaaaatatac ttgacactgg 
qgtcctttcc gccccccatc agattggcaa acattacaga gaataagatg taatgtcaac 
aagaaggtgg gagcatggct gttgagagtg caaactgttt ttacccttgt gaagggcgag 
ttagtatcta tcaaaatttt caatgcaaag acagtcatgt gttgcttaac gaagggggta 
tqttctgaga aatgtgtcct tacatgactt ttgtcatcgt gcaaacatca cagaatatta 
cttaacaacc ttagatggta ctgccaacta cacacctagg ctgtagggtg tggcctatag 68280 
ctcctggcca gtgaatctgt acagcaggca ctgtaccaaa tactgtaggc aaccagagca 68340 
ggatggtaag tatttgtgca tataaactta cctaaacata gaaaaggtat ggtaaaaata 
cagtatatga caatcttatg ggaccacggt tgtatatgcg gttcatcttt gactgaactg 
tccttatgca acacatgact gtgctctttg actcagacac tcccagccta ggaacctgtc 
ttacagaaat actccatttt atgcccatgc aaaggaaata gctgctaccc agctaaaaac 
tgtgtgtgtt ttattaaaac cacactaggt ttgagatcct acagacaatg aagccctggc 
tgataaacag ccttctgcta gtgtgaacag ttcacttcca aaaaattaat actacagtca 
gggagatacg tcctttttat gtcaatagaa gtgttatttg aagatttttt ctaccatctc 
atcagaaacc atcctcataa aagaccccaa gctgtggaag gtcactcacc gagctgaagc 68820 
taaaaccaat tgagtgaata ttcactttgc cataaatgcc cagagtgtct attttctctg 68880 
ggccgatcct gtggccatag agcagagtat gttttccatt tacagccacc tagagagata 68940 
cagaagacag agcccccccg ccaccaaaaa aaaacagtta ataaattaac ttattatcca 69000 
aatttaaatg tttccttgcg tccaatttta tacctgttcc cacaggtctc cacacacgtg 69060 
accaaatggt aaccgaaagc acccaaaaca ccagcagtgc ttccgaactt cttttgagtt 6912 0 
cctgatttga ttaagcacca ggaactgtct ctccccagac aaacaggcag tcataagtat 69180 
atgtgcgtgt gcttgtgtgc gcctctgcat acatgtgaga catctgggtt cgaaggtcgc *<«40 
ctggaagccc aggcacaggt gccctggaga ttgtgaaccc aggttaaaga tctgcctgcg 
agctccccac aacactgtcc ctggagacca cccatgtggc agacactcaa aatctgttgt 
atttgctaaa taaattaata tggacaagtt acccaacata tttgaattcg gttttcttat 
tctgaagtag gaacaatctc tacctcaaca gactacagta agatgacgag aaagtagctt 
ctgcaggtcc tcggcacgct acgagcctcc tccctccctc cctctcaaat gccaggtacc 
acgttaggtt ataggcagaa aacgcgagca gcatacaata tagtctctga ctcaaggagc 69600 
tttaaatccc attgtggagg caaaataaac acataaagcc aaagcacaac tgttaaaaac 69660 
tgtgaaaaag aatgatacag acacgggctc aaaaggccag agaagggaga ggtgaatgga 
gtttgggaga ttcagaaaga ctttcgggca ggagcagtat cccgaagatc agaaaggatc 
tgtgccagag gcatgtgagt cttctctcac gttggtacaa acctctactg tgggaaagct 
ccacaggcca gagggccagc aggtgtaaga gcttctggga ccaaattcaa ccatacacac 
gcgcgctagg tataagacac tccatgaagc cagagatgga actgacatca aggatttcta 
ggatgtcctc tgggtctaaa aatctcatga tcctacagaa ctgtcttgtg gcaagacttg 
aatttacaaa acccttctga tttctatgct gggattgccc ttgttagttc ttcaggtggt 
gacggttata ttataactta ccctcccatc atgaacataa tactgtgtca attcagaact 70140 
cagccctgct aggagaaata ggatctaata aaaagatcta gtcacagtgg accatatttc 70200 
tgcatggcaa caactggctg gagagtagtg gtgccacccc ctcatattta ggtgccttcc 70260 
tggtcctttt atggagaaag actacaaagc caaactatat ctcaggtgta ggaaatacaa 70320 
ggctgggctc acaaatgtga aaggcccccc tgtgcactcc tgctcactaa tgaggactca 70380 
acctgtccct ctccaaaacc tacctggaat ttgtccttta gcaccataat cacgatctca 
aaagactttt ctcttttgaa aggcgtgtca taggtgatct cttcccgtcc ccatttttca 
tttatcaaag tattgcaaac aatgcagccg gcccttttga aacgaggatt gaaatgaaag 
gccacatcgg ctcgaggttt cacactgctg ccattctgca gatccacctg gaatctatat 
gagggaagac acaggggcct catgagtgct cagaaaggaa agccaccagt aaaggcccag 
caggcactac actccaggcc cagtgacaag ggactgacca atccctcctc attcattcac 70740 
tatgaggcct aaagagctga ctgtctaatc ttttctgtct tcctcagtaa aatagcccct ™«nn 
gagtttgaac tgggtctgaa tctgcccaga ataaaaacct acctttccca gcctccctcg 
cagctaggcc tgcctgtgca gctaaagctc tgctgataag atggaagtga agtgctcagg 
gaaggctact agaaactgcc ttaaaagtca gcaagcactt gccttttgtc cagctctgtt 
tccccgcaaa caccacatgc acccatccag ctacccaaag cacagaagtg agggctggaa 
atctagctgc caaggataat gggaaagggg acctcacctc accctcaaga tagcagagtg 
gtaagctgga aggagtctac ttctctaacg acttgaagga gttaactgaa aaggcctggg 
ctgtctcctt gagacttaat tttcacatct taaaccactg ctcttcaggt gttaggtaca 
cagagctgag cccaacacta atgataccac agcttgtatc caaagcaagt attcagtaaa 
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accatctgaa ctagatctgt gatgctgtgg gaatgtcaga ctaaaactag actctccctg 71340 
tcatctgacc acaattcctt tctttaatga catacctaat tctttttctt tctttttttt 71400 
tttttttttt tttttttttt tttgtcaccc aggctatagt gcagtggctc catcacagct 71460 
cactgcaacc tcaaactcct gggctcaagg gatcctccta cctcagcccc ctgagtagct 71520 
acaactacag gcgcacacca ccaagcccag ttaattttta attttttttt tttttttaac 71580 
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agatgggggg ggtctcacta tgttgcccag gctggtctcg aactcctggc ctcaaacaat 
cctcccgcct tggcctccca aagtgctgga aaagtatgtg taagtcacca catccagctg 
atagttctaa tttttttata ctgaaatcag acttacagca atgaaataaa tcccaataac 
ctttatttag tcattcattc caatctcctc agcaccttgg ccagggctgc tgatgaaagc 
taattgggcc tcaragtatc aaatggtctc tcccgcctta ctctgttatc attccacaat 
cacaaaaaga cagcctattc atgctccttc ctttagcaca gtgattttac ctgtctgcgt 
cactaggaac atgcccacat atcacaatca aagttccagg atccagctga tcrggaatgg 
tgccaacata cgggattacc taagaaaaaa aatttaacaa atgtacgttt acataaacaa 

aattacttac taagcaatta caacccacac tcattattga tagagaaaaa ctacactttt 72120 

taaggctgca cataaatctc agaatccagc attttccaag tcatctttcc tattcaaaca 72180 

gctataaaaa ggaactgcgg ccgggcgcgg tggctcacgc ctgtaatccc agcactttgg 72240 

gaggccgagg caggaggatc acgaggtcag gagatcgaga ccatcctggc taacacggtg 72300 

aaaccccgtc tctactaaaa atacaaaaat tagccgggtg tggtggcggg cgcctgtagt 72 360 

cccagctact cgggaggctg aggcaggaga atggcacgaa cccgggaggc ggagcttgca 72420 

gtgagccgag atcgcgccac tatgctccag cctgggcgac agagcaaggc tccgtctcaa 724 80 

aaaaaaaaaa aaaacaaaaa aaaggaaccg cataccgcat ggccaggata tctgcaatag 72 54 0 

ctgcaaacca ctatgagttc tttggaaaga gacacaaggt aaatactatt catagtattt 72600 

tgtatttggc tgagagtttg tgaagcaaaa cttctgctta atatgataat tctaacagaa 72 660 
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aaaaaaaaaa agacttgttg caatgccatg tctactcatt ccttttccta ctgttccact 
gctgatccca acagaaggtt cgaggccaca ctaggcccaa agccaatgct gacggagaca 
atqacaagca ctcactgcct ctgaaggaaa ctccacgtta agccacgccc ccacacctgg 72840 

a - — " .~-~4-«-^ 72900 

72960 
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gattccaggg cctgctcttc tctgctggac tcccagactg caacccagac tgcactgtta 
gaaaccagag aactgcatga tcatgaggat gagtgggtgc ctgtgggtct tcaagacatg 
gcatccacct gccgtggacc agtccagtct gcaggcgtgg actctgacag ctggctccac 
ccagtattca ggtctcaacc tgcaccctca ctgcccagaa cccagccctt tttttctggg 
acctgccaca ctgccagatc ttgtcattcc ccttccccag agatgactac actgtcttcc 

cagtccactg gctggggacc tgtgctatgt ggctgcctct cctgcatcac aaccatctgc 732 00 

ctccatctgg agcatctgac caggatgtac agcccacaca ccgttaagcc tcacactaag 73260 

ctcactcaaa ttacgatgtg catggtaaaa cctacccaag gtacttctga ttgtcaagaa 73320 

ataagaaata aaataagacg ggccacagaa aagtggttat aaattggtgg ctcctaaacc 73380 

gaaatcgcct tgaggcacag cctcctctgt ggagccttct tcaagctccc tgggctgctg 73440 

agtcagcccc tctcagaggg ttcatgatag cactttggat tctgtttgtt tgcatgtagc 73500 

tttgcctaga ctgtgagctg catggggcag aactggctcg tcacctttac ctccccagga 73560 

cacaggtcaa gagacgatcg gtaactgcat ggtgaatgaa tgaactctca cctgtctgga 73 62 0 

aagtgggctc tggcaggcct cacacataca gaggagaggc aacaaagcag ctgctgaacc 73 680 

gcaagctgag cccacaagct ctctgttgcc ttaggcaata agatgagaaa ttacggaagc 73740 

caattatcta tttgttgtca tggcaattgc taggagcagg gtgggaggca cgtgacacca 73 800 

gaaaacaaaa aatacaacag acagtgtaga ctggggctac agctgcacat cagagtatct 73 860 

gattttgtgt gtagagaatg gggaaggacc tacatcccta cattgatctt ttgggtcaca 73920 

gctggttcca agaatatata gcagcaggtt tcatagcgtt aatctcttaa taaaatgaga 73980 

agtttttata acatcaaatt tcatcttaaa attatcttta tgcagaatat ttaataacac 74040 

aaatgttagt aatataagag aaagtaatct atagagccat atatagtatg atctcaacta "inn 
aataaatata aacattaaaa agaaaacctt gctaatacac actgtatcct gtatcaaaat 
atgccatata ccccataaat agatatacat actacatact caaaaaaatt aaaaattaaa 

aaaaaaaaaa gagaaacact aaagaaatca gacaaaaatg ataacaatag ttttcatggg 74280 

----- — 74340 
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74100 
74160 
74220 



ggatgatggc attttaattt ttttcttcat atattttcat actttgcaga tgtctataat 
aagcatcgat atttttatag ccagaaaaat atttttaatt atatgtttca ttttagtgtt 
ggaaagagct ttttgagaca aaacaatcta ttcccatcac tttttttttt tttttttttt 
tttgagacag agactcactc tgtcccccag gctagagtgc agtggcgtga tctcagctca 74520 
ctgcaacctc tgcctcccgg ttcaagcgat tctcctgcct cagccatcca agtagctggg 74580 
actacaggca tgtgccacca tgcccagcta attttttgta gttttagaga gatggggttt 7464 0 
tgccatgttg gccgggctgg tctggaactc ctgggctcaa agtgatcctc ccaaagtact 
ggggttgctt tgggaggtaa tgagctttgg gagaagctac cgcacccggc ccccatcact 
tctaatacct tgttcacaac tgtttatagg ttgttccagt ttgagaaaca actatgctaa 
gaacggtggc cacttgctgc tttagttttc tgggttcatc atgccttcta cttcagaaaa 7488 0 
gatcctgcat tacttcagta atactttttt ttttttttaa ctatgagaca cagtgccctg nAOAn 
ccgctgactc atttaaccca cgactatgtg gcttctgtga aaacggagga catacattgt 



74700 

74760 
74820 



749 
75000 
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tttctaggct actccaagct actttggtga ttcacagatc attattaatt tacatttaaa 75060 

aaattaaaat acagatctga tcacactgga gacaatacca caagcttacg ccacagtcag 75120 

gtttgctgtt atttgccaaa acacccttgc tgattcatat tgctttgcct ttttgatgat 75180 

taaatgcaag gctctctatt aggcagactg cagcttgaaa gaagcctaag agtcagacac '"^ 
ccaggaaaca taacagacac ccaggaaaca tggagtgaaa atgtgatact ctatttggct 
gctgaaaatt ggtatccctt tccatatctg aaaatcccct agcttatgaa cattagacct 

catctagact gcaacaaggt aagagcaggt gacaaaactg cccattagaa cttccagctc 7542 0 

agcttttaag taagaaccag tgacaagaag aagccacaat gacgcagtat tcactctggc 75480 

aagggggcag ccgctatatc tgggtttgga gaaagctaca tctggggtca aaggcagcta 75540 

catccggggt caggggcagc tatatcgagg gtcagaggga gctgcatcca gggtcagagg 75600 

gagctgcatc cagggtcaga gggagctaca tctggggtca gagacagcag tggcattggt 75660 

agagtagcag agacacagta ccatctgtgt ccagtggtga caacagatgc ctcttcatgg 7572 0 

gaaccgtcct acagtatgct tgtgtatctg tggctaccta ttctccaagc ctggttcttc 75780 

atcctgctag agattctgag aactctgcac tatcatttag caaattcctt ttctgcttaa 75840 

agagttggct tctgttgctt gtaactacaa accctgacca attcaacatc tttaaaaaaa 75900 

aaaaaaatgg gggtcaggcg cagtggctaa cacctgtaat cccagcattt tgggaggcga 75960 

aggtgggcag atcacttgag gttgggagtt gaagaccagc ctgaccaacg tggagaaacc 7 6 020 

ccatctctac taaaaataca aaattagccg ggcatggtgg cacatgactg taatcccagc 76080 

tactcaggag gctgaggcag gagaatcact tgaacccagg aggcggaggt tgcaatgagc 7 614 0 

cgagatcggg ccattgcact ccagcctggg caacaagcaa aattcagtct caaagaaaaa 76200 

aaaaagggtg ggggagcagt aaagaagagt caaagaaatg cttcaggaaa atgggataag 76260 
gaattctaac tgaggtagcc aaagacagtt tatgaagaag atatgatttc agttgccact 
tcaggtagaa atggcagttt acaaaagcag agaggcagga aaagacaaga cccatcttct 
ctcagcacgc agtcagggag tagttaaaga aaggtacagc cagggtcaga ttttagagtg 
acatacgaaa gaagtgttta ggaacaacag tccagtgggg gtgaaataag ccagaggtca 

ggaggtcatc tgggtgactg actgctgcca taagcctggc gtgaaacgat aaggacacat 76560 
tccaggtcgg tggcaatggc cctggaagag cagagaagtc caagagatgt ttcaaaggaa 
actgtctaca taaggccact aaatacatga caaagctggg gagaaaacag gttttttgtt 
ttttgttttt tgtttttctt gagatggagt ttcacttttg ttgcccaggc tggagtgcaa 
tggcgtgatc tcggcttact gcaatctccg cctcccgggt tcaagcgatt ctcctgcctc 

agcctcctga atagctggga ttacaggtgc ctgccaccac gcctggctaa tttttttgta 76860 

tttttcgtag agatggggtt ttaccaagtt ggccaggctg gtctcgaact cctgacctca 76920 
ggtgatccac ccacctcggc ctcccaaagt actgggatta caagtgtgag ccaccgcgcc 
tggcctggga gacaggtttc aggaggaaga tgacgagttt agtttgtaat acttggtgtg 
aagcgttggc agagcatctg agagcaaatc tgctaagact tggagactca gaaccacacg 
ctaggtgaat ggccagagca cacatttaaa agtcagggta aagatcattt gtaaagatat 
caaagtagat aaagaaatcg aaaagcagga caggtcaggc agcaaaggct cggtggtttc 
taaaattagg aggtatgcag aaaagaatta acattggcag gcctgactgc tgtcctggaa 

aggcctcctt acgaggcagg cccctggctg gtgtctggga acttgggatt tgagaagggc 77340 

acccaccaac ctaactggta acggggattt cctgtacctg aactgttcat gcaaataata 77400 

tggttatgct aaacacctgc tttccttctg ggggaccaga attttttgat atgtgtgagg 77460 

gtgcttatgt gaccagcctc cagtaaaaac cctaggcact gaggctctga tgagcttccc 7752 0 

tggcagtcca catttcacaa gtgttgtcac aactccttgc tggaggatgt aagcacatcc 77580 

tatgtgactc cactgggagg cgaccttgga acttgcactg gatttcccct ggaattcacc 77640 

ccaggtgcct ttccgtttgc aattttgctc tatatccttt caccgcaata aatcataggc 77700 

acaagtacaa ctctatgctg gtgagtcatc aaaactgggt gtgaccttgg ggacctcaac 77760 

atggggtgag aagagaaagg agagagaaga atcagtcagg ccagggcaat aatatcaagg 77 82 0 

gagtcaatgg atttataaat ttcagtaaga ggaaaactgt caatatcttt aacattgaag 77 88 0 

aagctaagaa ggatccaagc aaataaagat aataaaatgg ggccgggcgc ggtggctcac 7 7 940 

acctgtaatc ccagcacttt gggaggctga gatgggtgga tcacgaggtc aggagatcga 78000 
gaccatcctg gctaacatgg tgaaaccccg tctctactaa aaatacaaaa aaattagcca 
ggcgtgggtg gcgggcgcct gtagtcccag ctactccgga ggctgaggca ggagaatggt 
gtgaacccgg gaggcggagc ttgcagtaag ccaagatcat gccactgcac tccagcctgg 

gcgacagagc aagactctgt ctcaaaaata aatgaataaa taaataaaat aaataagtaa 7824 0 

agaaaatgga ccataccctc tgacctagtg attccacttt tacagattta tcctacagaa 1 " nn 
atacgtagca caggcacacc aagatgcaca cacaaggatg tacactatag caccggatat 
aatcgcaaaa agacacagcc cttcacaagg gactgcttac ataagtttta ctgctaatta 

tgcagctgtt aaaataaatg aggaagttct aaataaaatg acatttccaa aataggtcaa 78480 

atagaaaaag cagggtagaa aacagtttat acggtaaaat gcatgcacgt aaagagacag 78540 

gatatataaa aggtgatata cacaacacac acacttttgg atattcataa gtatctcagg 78600 

aaggatatat ttctaaaaac cagtaaaaag tgattgccct tggggagtaa aactaagtac 78660 

cagggtagaa ggaaaatgca tgtttcactg aaactcttaa atactgtttg aattttaaaa 78720 
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aaatgatttg cacatattat ctttaaattt ttttaaaagg aaaagccact ggtttggaca 787 
taaatgtagt ttaaaatttt gaagacagac tatcttccaa ctattgggtg ctaggaaaca ™« 
agggaggagg caaaagttaa acagcagaag gatcctgctg cagaaaactg caagaggaaa 
gctggtttgg gaaggaagat gcaagtttgg ttttgaactc ctagtcttga agggctggct 78960 
gagcaactga gagttggggg ttcacattag gtgaaaggtt ggaactggat acatatggaa 7 902 0 
aaaaaaattt atgagtgatc ggtaaagaca taaaagaaag agacattaaa gacaaaattt 79080 
cactagtaaa accaaaaaag aaaagagagt gaaaaatcag agggcagagc cttagtgagg 7 914 0 
aaatccactc tcagtcagga aggcaggcag caaaaaaaga agaaaaacag cacaaaggct 792 00 
agaaggtacc tcacaggtta caaaagaaag acttgactgt gttgtgtttt aagactaaat 7 9260 
tgtaaggtaa aaaaggagta cttagaatac taagcagaaa aggataaaca tggaagcaag 7 932 0 
gttcctaaga aaactgggta ccaggtagga tttatgttga caagcaactt attaacttgg 
agtaacaacc aattctgttt catttctaaa tgatttctta accagctagc atttaatcat 
atcttaagaa ctctataaac aatagcaaaa ttaaatatat aggtagataa tgaactgcca 
aacaagaaat ttcactggca gcattcctgg tatgaagtaa agatactgtc ttccaaatta 79560 
ttcattcaaa ttcaatagaa agaaaagtaa gtggccagcc acggtggttc acgcctataa 7962 0 
tcctagcact ttgggaggcc gaggcgggcg gattgcctga gctcaggagt tcgagaccag 79680 
gctggccaac atggtgaaac cccgtctcta ctaaaataca aaagaacaat tagcggggcg 7974 0 
tggcggtgtg cgcctgtagt cccagctact cgggaggctg aggcaggaga attgctagaa 
cccaggaggc aaaggtggca gtgagccgag attgcaccac tgcactccag catgggcaat 
agagcaagcc tccatctctt aaaaaaaaaa aaaaaaagag aagaaaaaag aaaaaaaaaa 
cagaaaagga agcctaactt tttttttctt tttttttttt tgagatgaag tatcactctg 
tcgcccagga tgcagtgcag ttgtgcaatc ttggctcatt acaacctcca cctcccgggt 
tcaagtgatt ctcctgcctc aacctcccat gtagctggga ttacaggcca ccgccaccat 
gcccggctaa tttttgtatt tttagtagag acggggtttc actgtgttag tcaggctggt 
ctcgaactcc tgacctcaag tgatcctctc gcctcggctt cccaaagtgc tgggattaca 
ggcatgagcc gcagcacccg gactctgact ttttattttt atttttgaac atggttcttg 
ctctgtcacc cagaatgaag cacagtgtca aactcacagc tcactgcagc ctcaacctcc 
tgggctcaag caatcctccc acctcactct tcctcatagc tgggactaca gacatgcgcc 
acaatcccca gttaattttt gtatttttat agagagacaa ggtctcacta tgttgctcaa 
gctggtcttg aactcctggg ctcaagtgat ccactagcct tggcctccca cagtgctgga 
attacaggca tgagcctctg gacccagcct tagagcctga cttttttaag tcttggaatt 
ctatcagttg aactagtcaa agaaatatac taaataggga aaaaaataca taggccagtt 
gcaatattta cttcatcttt tccaagaaaa cttaatgtgg acagttggtc taacaccaat 
gtggaccaaa atattagaaa taggccaggc atggtggctc atgcctgtaa tcccaacact 
ttgggaggcc gaggcgggag gagtgcttga ccagactgag aaacatggca aaactctgtc 
tttaccaaaa aaaatacaaa aattagccag gcatggtggt gcatgcctgt agtcccagct 80 
atttacaagg ctgagcccag gaggtggagg ttgcagtggg gtaagactgc accactccac 80940 
tccagcctgg atgacagagc gagcctctgt cttaaaaaaa aaaaaaaaaa aaaaaaaatt 
ggaaacatta atgaaataaa aattaaaaag tggcggtgtg tttataatgc ctaataaaga 
atacaacaga caataaatat atttacttat tgaacctatc agtaacaaaa caaccatgat 
tcaagaaaac aggttttgtg tgtggggggg gttttttttt ttttgagatg gagtctcgct 81180 
ctgtcaccca ggctggagtg cagtggcacc atctcggctc actgcaacct ccgcctccca 81240 
ggttcaagtg attctcttgc ctcagcctcc cgagtagctg ggattacagg aggctgccat 81300 
cacgcctggc taagttttgt atttttaagt agagacggag tttcgccatc ttggccaggc 81360 
tggtcttgaa ctcctgacct cgtgatccac ctgcctcggc ctcccaaagt gctgggatta 81420 
caggtgtgag ccaccatgcc cagctgaaaa cagttttcct aagtctagct tataaaatat 81480 
actaattgat tatctttaaa taactaagtt ggccaggcat ggtgcctcac gcctgtaatc 8154 0 
ccagcacttt tggaggtcaa ggtgggcaca tcacctgagg tcaggagttt gagactaggc 81600 
tggccaacac tgtgaaaccc catatctact aaaaacacaa aaattagccg ggcatggtag 81660 
cacgtgcctg cagtcccagc tactcaggag gctgaggcac tagaattgct tgaacccggg 81720 
aggcagaaat tgcagtaagc cgagattgca tcatttcact ccagcctggg tgaccagagg 817 80 
gaaactatat ctcaaaaaat ttaaaaaaaa aaattaattg aataaataaa ataagttatt 81840 
tacagccttt tttttggaac attgaaaggg caactagcta caaatgagag aaattcagtg 81900 
caatacaggc cctttatact ataaatattt tacagcagtg aaacgtaaag agagcgcaaa 81960 

... ^ 4 82020 
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atgtttttgc cattcagttc agtctacctc taacctttcc gactcttttg tgagaactga 
taataaaaca gagaattttg aaaagaaaga aaaaaaagct atttgccaaa aatatctccc 

ttggaaatgc attatcctca gaacagcatc cagcccatgc cacaagactg aaggcatttt 82140 

ctgctctagc gcagaactac ttctccagat ccccttcctc aagaaatgaa gtctacttat 82200 

ttttgttcca cctcaatagt tgagagtact gaccccagaa actacaggaa tcagcagtat 822 60 

gctagaatca agatatgcac gaattttacc tataaaatta tcttcttttc tgtgtgaagg 82320 

gcagaaatga acagtgtaac ctttatccat tctcccagct tgagccaaga tgatacttca 82380 

gacacccgtg gcaggcagcc tagtttgttg ttgttgttgt tgttgttaag atctttgcag 82440 
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gaaatcagtt tacaaccttg ggatgttttt aactctaaca tgcgcaaagt catcttaaat 82500 

gtctcacaag cttccgcttc aggaagtcat cttttttaaa cttactacca ctgaaaggct 82560 

atttctccta aaactgactt tgcttgatac agcagcgata cctcattctt acacaatgac 82620 

attaaaactt agggaaaaag gaaatacagc tataaagtaa atgacaaaaa cttgaaccca 82 680 

cacacactaa caaaactggt ttagggcctc attttaagga ttctcaccct tccttttgcc 82740 
caagaatctt ctaggcggtt tactaaaaaa gtggctgtcc tttttcagac ctcgattcag 
gattcagctt cagatacgtg gaaactagac attcctaaag attctcacca ccacataaaa 
ctaaaacaag ctctttactg ctcaggatta cagggcaatt tccagcaatt acagtcattc 
agggattcta ggacctgcct gaactgcacg agacccttac tacttcacac tctccatctc 

cccattggct tttgacattt tccctgctca agggccaagc agtatttgaa aggctgaggg 83040 
aaagatcgag acattaactt ctcatggaca gctctaatta aaaaagaaaa tgaaaaactt 
gtagagtaag aaatccattt tcctttaaaa actacaattt atgattagct gagcctcctc 
ccatcaccaa aagttggcat tccctccact ctacccagac gttccctgtt cataacactg 

tttcatcacg tcatattatc attgtgactt cttcctccac tagaggacaa gagctgtttc 83280 

gtaatcagca cccaccacca tctctattac atagtaggtg ctttaaatat gttcactggc 83340 

ttttattctt gccctgtctc ccaatggata attaatattc tattggatct gtcctggcat 83400 
aggtaaaaag ttatcttata gaaatcagtt accgggttat agatgatatt ctgtaggttg 
tttaaggaca acatcattct tttccagctt cttgtcgatt ggagtctctt ctgtgtatga 
cctaagattt taggcaagtt tcatttaaag gttacctgga ttgaaactga ggcactggcc 
ctgtgtaaag taaaaataga ggaaaagaaa agtaagcatg tagcattttt cttcatatcc 

tattttaaaa tttaaattat atattatgtt actgatattt accaaataat agaatattat 83700 
tacaatcaac gatctgcctc ccattcttac cagtgtgctc atatactaaa taattttgga 
ttgtgagtgt aggaaatatt gacttaaaaa atacatcagt agaaaatttc taacatggaa 
tttattatta aaaatactaa aataggccag gcatggtggc tcacacctgt aatcccagca 

ctttgggagg ccaaggcggg tggatcacct gaggttcagg agttggagac cagcctgacc 83 94 0 

aatatggtga aatcccatcc ctactaaaaa tacaaaaatt agccagacgt agtggcatgc 84000 

acctgtagtc ccagctactc aggaggctga gacaggacaa ttgcttgaac ctgggagacg 84060 

gaggttgaag tgagccgaga ttgtgccact gcactccagc ctgggcaaca gaacgagact 84120 

ccatctcaaa aaaaaaaaaa atttaattaa ttaatggtaa atactaatca aacagtcccg 84180 

tacaattatc agaggtattc atttaaattt tcatttccat aaaatgagaa ttacagtatt 84240 

cacatcattg gtttgttctg aggattgagt taataaaaca gcgaaagagt aagcgctatg 843 0 0 

ttagctatta ttattgtgaa tagaaagaat tgctcttcct cctccaattt aaacaaatca 84360 

aagtagggaa aaatccaata cttttaatac tattaagata cagttttctc tgttgcttaa 8442 0 

aaaaataata atcacagggc aggggagtgt tggaaagcat cagccacatt ttttaagata 844 8 0 

aaagcactca tggacactac actacattta atagctccag gaaaaactcg actttaagca 8454 0 

gaactaaagg ggaaatgaaa ccagagcttc ctgtatttta cttccagcaa ttctgtcatt 8460 0 

atactgcaca ccaacaatac acaccgatca aatctatcac tttttcttta ttaagaaaaa 84660 

aaactgtatc cctcttggtt taccacctaa atatagcccc atgtcattaa cttaattcgt 84720 

tagtcaaaac ctcaaaactc tggctccgtg actcaattca ggaagtaaga acaagagcaa 84780 

aaagaatgga tgccgagttg ccatacacat gtataataac aagccagtga cccaatttaa 84 84 0 

gccatctgct tgcattaaat cacgcaaccc ccgaagtatc cccagggaca ggtcccgcca 84900 

gcatgaacac ttcgtatgca tcacaagcag ccatcactta agtttcacgt acggtcaaag 84960 

gaagtcacat gacttgcgct ttgcaatgtt taacactgca gtcaaatgac tcggcatcct 85020 

aaagagcgtg ttagaggcag ggaacgcaat ggaggtcact ccactgtcac tacaaattcc 85080 

gggaaggaaa cttccccaga ttcctccact tggaggtggc gctcggcctc aggctaggag 85140 

ggaacaggtg agaaagcagc ccaggtgggg tgggtttgca gcgaggagac accccagggc 85200 

aaacagcctg accccagcca gggatgtcca agaaaggccg cgactcctga taatccctta 85260 
tgccccggag cgcctcgcct gcagaggcag cgtccccgcc acccagcccc ggctctgccg 
cggtgaggac cggcgggtcg gggtggactg gacactgtcc cacccatcaa atggtgattt 
aggagccgtg acatccgaat gccatcctcc actggcgaga ccctcagagc agccacgcct 

ctagcgactg ccccgccacc cgaggccggg ggtcgcgcga ctcacccaaa gactggtgtt 85500 

tcaggcgctc cacggagcag gttgtttgtc agcagctaag tgccgtcagg gttcccggct 85560 

ctggcgtccg tgggcggcta cgggaagcga caggagtcag tcctcgttca cttcccggct 85620 

cgcgcgcctc actgctgtgg tctccccacc ctccccgcgc cccgccttct gtgtctgggg 85680 

cgtccctggc ggctctgctg grttttggac agggacccgc cgctgatcgc cacccagctc 85740 

ggcctcctgc acagcctctg gagccttgga ccgcgactgg cttgctgtgg gacgagcaca 85800 

gagggataag gacaaagaat gtgtcctggg tggatctggc tgcctttgcc cggaaggcgg 85 8 60 

agtggggtgg gaggtggtag gaaaatggga aggaaagaaa agaaaggtgg gccgacgtcc 85920 

acctggctgt tccaggcctc caggtctagg agggagggcg ctcggggctg ggacttttca 85980 

ggaccagggt ggtcaccgca caggccccgc ctgcctggac caagcgctgg ccttcccggg 86040 

gcgcccaggt ccacggggtc aacgccaggg ttttctcagc ttcctcgtct gcctcggatc n«mn 
caagtccaga cagtgccaga agagacttgg aggcgctgct ttttgacagt acacacctct 



85320 
85380 
85440 



6100 
6160 
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gtatgcaggt 
ccccatccgt 
gcgtgtgggt 
aattctgtgc 
tagtttgacc 
catgtcagcc 
gttctcatct 
cattaacgtg 
gaattcatta 
cctccttacc 
ttacccttga 
tatctatcca 
atagtgtaat 
taaagtcagg 
actttgagac 
ctatagacgc 
ccactgtgtt 
ctcccacaat 
acttctttac 
taagtcatat 
gaaaatctgg 
atttggcact 
atgcctatca 
ccttctagat 
tatatttgcc 
aaaacataat 
gcagtctcac 
cagcctctgc 
tacaggtgcg 
ccatgttgcc 
ctgtaacctg 
ctcattcttt 
cacacactgt 
cgactttgct 
tttccttttc 
accaatattt 
tcttccttgc 
tttattttat 
gattgttctt 
tagagaagat 
tctcagagga 
gatggtgggg 
tgagaatgca 
agggaagaat 
aaatcacctc 
acaaccaggg 
tgtgccaagt 
gatgtcattg 
gaaaccatga 
ttatagctga 
ttttttttaa 
ctgccataag 
caaaaatttt 
ggcaggtgga 
tgtttctact 
tactcgggag 
tgagattgta 
aaaaaaaaaa 
tcattttaaa 
tgtgagctga 
gacagccatt 
tagaacttag 



gaaacggtgg 
tagtgataac 
gttttctcca 
ctattagatg 
tattggctgt 
acttggaatg 
tctctacttg 
ggctttctcc 
caagtcccat 
cattttccaa 
catacactgc 
actcgactgt 
gctttgcata 
agcaattgaa 
agggtttccc 
acaccactat 
gcccrggctg 
gctgaaatta 
aaagctaaag 
ttaggtgcaa 
attaaaatgt 
gttgttcaat 
tggccaggca 
agagtcccac 
cttgcccctc 
tcagatcatg 
tctgttgccc 
ctaccgtgtt 
cgccaccatg 
caggctagtc 
ggattgcagg 
gagtaaagat 
gatttcatct 
attctcctag 
ccttttcaag 
aaaattgcaa 
tttatttctg 
gttacttcac 
gtttcgagga 
taaattggga 
gtcaaagttt 
aagagcattc 
ttaaatgtgt 
gatatgacat 
ggacagccct 
gaatttgctg 
gtgctgagag 
ctgactctga 
ctataaagat 
ggaaggagtg 
agagaagata 
aattacattt 
gaccaggcac 
tcacttgagg 
aaaaatacaa 
gctgaggcat 
ccactgtact 
aaaaattgta 
aaaatgctgc 
agtttgaaga 
tttgtacata 
gaactttttc 



gggaagggtt 
ttggactcgc 
aattggacac 
ctcttaaata 
aaacacacca 
gctggctgct 
gcttttatcc 
tccctggctt 
aactgggaac 
gcctgatgca 
cttggattta 
agtttgagaa 
aaatagatat 
tatcttcaca 
tcaagtgatc 
gcccagctaa 
gtctcaaact 
caggcatgag 
taagttgaaa 
acggggctga 
tcataaatcc 
tagcccagtg 
cagcagtaaa 
atctcttaca 
tgcaatgtag 
ttaatccctt 
aggttggagt 
caagtgattc 
cccagctaag 
ttgaactcct 
tgtgagccac 
ccagttctta 
cctactactt 
caggcctgtc 
tctactcaaa 
acactctccc 
gctaacattt 
ccatcataag 
agcttatact 
gcacctgagt 
aagctgctgg 
cagccttagc 
tcaggaaaca 
catgagattg 
ggggagataa 
agagtggtgt 
gccaaggagg 
ctctgaagtg 
ttcagaaagg 
gagcaaaggg 
ctacattggt 
ccattgagac 
gttggctcac 
tcaggagttt 
taattagctg 
gagaatcgct 
ccagcctggg 
caaaacaata 
ttatagtcca 
acactgggta 
acttccatgt 
ccccaaagaa 



cagtacgctg 
agccactccg 
ttagggaaca 
cccgacttct 
gccagaaata 
tactgtttat 
acatttgttt 
ttgttttcat 
tcatgttctg 
cttctcaaga 
tagctgtctt 
aaggaacttg 
caatgttagg 
attctgcctg 
ctcccacctc 
tttaaaaatt 
cctgggctca 
acaccaagct 
taagtacaca 
gtgaaagggc 
taagaactta 
tgttatttta 
gaatactagc 
ttattactgc 
tctcaacaca 
actaaaacct 
ggagtacagt 
atgtgcctca 
ttttgtattt 
gacctcaagt 
tgcatccggc 
cacgagcact 
tccttctcac 
acactcccat 
tgtcagttct 
aaaacacact 
gtcatgctaa 
actataaact 
ctagttgagg 
ggcaacctta 
tgagtgaaag 
gagcagctgc 
aaggccacca 
gatggcacca 
gacaacgcca 
cacagatgcc 
acggaagtaa 
tgttgagaaa 
tttgctgtga 
cagattcagg 
gacttttaga 
ttactacaca 
tcctgtaatc 
gagaccagcc 
ggccttgtgg 
tgaacccaga 
caacagagtg 
ctcctgttat 
ctaaaatgtt 
gtatgttggt 
tgataccttg 
caagatctta 



gactgtgccc 
cgtcactcgc 
gtttaagcag 
cagggcccta 
caaataaagt 
tttstgttag 
ggaaagatat 
tattttttgt 
attcactgtt 
tccaatctga 
ttggtatgtg 
tcttatacag 
ttgaaatgtt 
agtctccccc 
agccccagga 
tttttttgta 
caatatcttc 
gggccctgag 
taaatgctgt 
cagagagcag 
ttcctagaaa 
aataagcaaa 
atgcaaagga 
agtatcctaa 
acagccagag 
actttttttt 
gttacattct 
gcctcctgag 
ttagtagaga 
gatccgcccg 
caaccgtcta 
tcctgatctg 
tttttctgca 
cctaaaatct 
cagtggggcc 
catacacact 
tatactatat 
ccatgagggc 
acacatacag 
gttgggtgac 
aaaggaaaca 
tgcaaagtcc 
accagtatgg 
cctaggagag 
ataaatgaga 
aaagaaagag 
gcaatgggct 
tgggaggtga 
ggaggaatgg 
ttttctcctg 
cttctttttt 
catacttaaa 
ccagcacttt 
tggccaatat 
cgcatgcctg 
aggccgaggt 
aaactgtgtc 
gtgtaaaaca 
tgcaagcccc 
gttagctcat 
aagttatcca 
aacatttacc 



agcccaagct 
cggttatcct 
tatggagcac 
cactgactga 
taaacaaagt 
ggacctcaaa 
attttagtgt 
tttaattgag 
gcttttctct 
ataggtgaat 
ttgtgtgttt 
tctccagtgc 
agattattga 
cagccccgcc 
gtagctggga 
gagacagggt 
ctgccttggc 
tcaaccttgt 
gtttcttttg 
agttaacaaa 
tatctccctc 
tattcactga 
cctcatgggc 
tcggtctccc 
aagtcatgac 
tttttttgag 
cagctcactg 
tagctgggac 
tgaggtttca 
cctcggcctc 
ctaatttttg 
cccgaatcct 
gcagccacag 
ttactcaata 
cttcctgacc 
ccctagcaac 
aaggtgcctt 
aaggagtttt 
tgatgtgttt 
aaggaaagtc 
gccatatgaa 
ctgagggtct 
ctgaagcatg 
tagtgggaga 
ctgagaaggg 
aaacagcctg 
taggaacatg 
agaactggtg 
agaagcagca 
ttgttttttt 
ttcaagtgac 
attacgaaaa 
gggaagccga 
ggtgaaaccc 
taatttcagc 
tgcagtgagc 
taaaaaaaaa 
atctactttt 
actaatgggc 
actggcttgt 
gggcaactgc 
agcacgctac 



86220 

86280 

86340 

86400 

86460 

86520 

86580 

86640 

86700 

86760 

86820 

86880 

86940 

87000 

87060 

87120 

87180 

87240 

87300 

87360 

87420 

87480 

87540 

87600 

87660 

87720 

87780 

87840 

87900 

87960 

88020 

88080 

88140 

88200 

88260 

88320 

88380 

88440 

88500 

88560 

88620 

88680 

88740 

88800 

88860 

88920 

88980 

89040 

89100 

89160 

89220 

89280 

89340 

89400 

89460 

89520 

89580 

89640 

89700 

89760 

89820 
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tgactttact 
ggtgttccca 
gccacaactg 
tttcatctat 
ccacacattt 
gctggagtgt 
tcctcccacc 
aatttttttt 
tcctgggttc 
gccaccatac 
cccacatttt 
tgctacttag 
ccgtattgtt 
tgttactttt 
tagcggtgct 
atatgataaa 
accagaactc 
aatttgccac 
tgagaaaatt 
tttacaaaac 
ggacggtgtc 
ttgaatcctc 
caccaaacyc 
cygaagttct 
tagaactatg 
agtataaatt 
ttttttgaat 
gaggaaaatg 
aatagtatac 
cgaggcagat 
ctgtctctac 
ctactttgga 
ctgagactgc 
aaaaaaaaaa 
gtaacagtca 
aatgtagtgt 
ccctcagcaa 
agcttgtgtt 
acgttagagc 
ccaaccaagt 
gacctcctct 
ttcagccttc 
aaacaaacaa 
cgcccagctc 
cctccccacc 
tcaaaactaa 
cagaagactt 
ctggttgttt 
gatccatctg 
tgatggtcat 
ctatcatttc 
ttcttttttc 
ctctttcttc 
tctcactctg 
cctcccaggc 
atgccaccat 
ccacgctggt 
tgggattaca 
cttgataaat 
gaagctgcat 
agcacttctt 
ctgttctgat 



catgtgaatg 
ataaaagttg 
cctgagagtt 
attgtcctct 
agaaatagtc 
tgtggtacaa 
tcagcctccc 
ttttttggta 
aagcagtcct 
ccaggcaaaa 
ttaaaaaact 
cagggttaac 
tcaagctaag 
gttaacataa 
aagtgctggg 
agagtatcaa 
cgtcacagaa 
aatggattat 
aatcaaaata 
cacaatactc 
ttytacctct 
ttgctttgac 
ttttttttca 
acaggatcat 
gttttgtcat 
aattcatctc 
ggcttacaga 
tatctgaaat 
atgggccggg 
ggatcacgag 
taaaaataca 
ggctgagaca 
gccactgccc 



gacagtaatt 
tcattgttat 
ggcagtctca 
tacacagttc 
aagcaaaaaa 
actcttccaa 
ttcggctttg 
tcttttcttt 
aaaaacacac 
cagcaattat 
actggttcat 
ttgttagatt 
tacaggtgat 
ttctttttat 
ttacgaagtt 
tgcctagata 
tcctgcattt 
tttctttctt 
ctttcttcct 
tcacccaggc 
tcaagcgatc 
gcccggctaa 
cctgaactcc 
ggcctgagcc 
aatgcttctt 
tttttgtgtc 
gagaattcat 
gggaaagagg 



atccgtggag 
cgaaaaccag 
atgcagataa 
ttccttgccc 
tttttttttt 
taatggctca 
aagtagctgg 
gtgatggggt 
cctgcctcaa 
ttaagtctta 
agtaaaaaaa 
tgattaaaca 
gaatactctg 
ttgtaaaatt 
aaaatgtgct 
gaactttaaa 
cctgtgacaa 
acatttgtga 
tgtaggtttt 
acatccttat 
tccgattatc 
ctaatctccc 
ccctccagac 
gtcccaaatc 
aaaataaaag 
aatagctcaa 
ctatgatgtt 
tatgtaaaat 
cgcggtggct 
gtcaggagtt 
aaaattagcc 
ggagaatcgc 
tccagcctag 
acagtaaaaa 
tacatacaca 
cttttctctt 
atatttaacc 
attgtaaagc 
attgaggtga 
aaatatttta 
gagccccctc 
cttgcctatt 
aaaagtagag 
cgacgtggcc 
tttgaagcaa 
taaaggttta 
atgtagctcc 
gacatcaaaa 
cccatcagtg 
cgttatttca 
attagtaaga 
tccttctttc 
ttctttcctt 
tggagtgcag 
ctcccacctc 
tttttgtatt 
agagctcaag 
attgcgcctg 
tttgtatatt 
aaaggttaaa 
ttaaattctg 
tattggaggt 



aggagaaact 
tttcagaaca 
aaggatcact 
aagaagccac 
tttgagacag 
ctgcagcagt 
gactacaagt 
cccactgtgt 
cctcctgaag 
caagcaaaat 
tcatgatttt 
gaaatagatg 
gcagtgattt 
tttggacccc 
ccaactcaat 
gaactgcctt 
taatgataaa 
gcgccgagaa 
aaattatgca 
acatacatct 
ttccccaatc 
ctacacccca 
cccctatgat 
cagtcttttc 
aatcttagtg 
aggatgagat 
agtactaaaa 
gtaaagacaa 
tatgcctgta 
cgagaccagc 
aggagcggtg 
ttgaacctgg 
gtgacagagc 
aagtgcatat 
tatttagcaa 
gcaaaattac 
aacccttcca 
aacaagtaaa 
tttttttaaa 
tctgggagta 
cctctgtctc 
agactctctg 
agatggtgtg 
aatcttgttt 
atctcagact 
ataaggtccg 
aattgcaact 
ccagttagtg 
tttgatcaga 
ttaggaattg 
attctttttc 
tttctttttc 
tctctctctc 
tggtgcgatc 
agcctctcga 
tttgttagag 
ccatctgcct 
acctaagaat 
tgccacagtg 
tgccatgtaa 
tctctgctct 
gaggcccaag 



gatcatgcag 
atagatattg 
tctggtagga 
atttatatgg 
agtctcactc 
gacctcctgg 
gcacaccacc 
tgcccaggct 
tgctaggatt 
gttaaattat 
agacatgaat 
ccatattcta 
tgcttattgt 
ctctccaacc 
aattttggaa 
ggtatgctgt 
aaaaaatatg 
cacataccct 
acatcttcag 
gcttccccag 
taarggaatt 
atactagccc 
ctgattcacc 
aggtgggaga 
acgagaggga 
agcctatttt 
aatgctgaat 
aatgatacta 
atcccagcac 
ctggacaaca 
gcaggcgcct 
gaggcggaga 
aagctctgta 
gtatatgctg 
agtgcaaaag 
tgagattcat 
gcgcatagct 
acctcaggaa 
ttaaaaaaaa 
ttttaaaaca 
tgtacggggg 
ctccttaaaa 
ataaatacct 
cacctacacc 
tcatttaatc 
gctgggctta 
catcaggaga 
gtttcagatg 
tgattttagc 
aaaatggtga 
tcctttctct 
tctttctttc 
tctctctttc 
ttggctcact 
gtagctggga 
atggggtttt 
tcctcagcct 
tattttatta 
tatctttaga 
ttttgctaga 
accatagggg 
aggctgtagt 



gagaggggat 
ccaattattt 
tcaatttctt 
tataatatag 
tattacccag 
gctcaagtga 
atgcctgact 
ggtcttgaac 
acaggcatga 
atatatagag 
tgtaagctct 
tcgatgtgaa 
tttgttgctg 
atccatgaaa 
tacaagttag 
gaatttagaa 
tgaaagcaaa 
gctattctca 
gaatgcattc 
gaagtgtcat 
catttctcct 
catccatgag 
aggcttacct 
aacaagcttc 
tcttaggagg 
gtgaaataca 
tatttgatat 
aaaatgtata 
tttgggaggc 
tagtgaaacc 
gtagtcccag 
ttgcagtgag 
aaaaaaaaaa 
tatatatcca 
aatgatgttt 
taaaaggctt 
gatctcttcc 
tttctatggc 
gctgttgaat 
tacacaagag 
agcgtcttcc 
ccaaaaccaa 
gtgctttcat 
tcacccactt 
tgtaaaagct 
attttttggc 
cagaaaatat 
ttgttagcct 
atctagctat 
tatttcaatt 
ttctttttct 
tctttccttt 
tttgacaggg 
gcaacctgtg 
ctacaggtgc 
gtcatgttac 
cccagggtgc 
ggaaatagtg 
tagattccta 
tatgagtcca 
aaggtccggg 
gggaaacctc 



89940 
90000 
90060 
90120 
90180 
90240 
90300 
90360 
90420 
90480 
90540 
90600 
90660 
90720 
90780 
90840 
90900 
90960 
91020 
91080 
91140 
91200 
91260 
91320 
91380 
91440 
91500 
91560 
91620 
91680 
91740 
91800 
91860 
91920 
91980 
92040 
92100 
92160 
92220 
92280 
92340 
92400 
92460 
92520 
92580 
92640 
92700 
92760 
92820 
92880 
92940 
93000 
93060 
93120 
93180 
93240 
93300 
93360 
93420 
93480 
93540 
93600 



gcacctgaaa tgacaggaga atccaaattc agggagcgtg tgggatcagg agccacatga 93660 

aaaaccaaag gccaggagcc aggaaggaaa tctggggaat ttcaaatagg gccaagagca 93 72 0 

gatatggaag cttccatcca ggaacataaa tgtgggaaaa atgaatacaa aaacaggctt 93780 

ggaacaaatt ggggagggtc caaggtcatt accccaaaca gcagctgctc ttttacaact 93840 

atttttcttg gctggctgga acataagaca aaggcacagg gctgtttgca catgtttctg 93900 

tcacgccgag ggcagctaac tgaaggagga tgtggtagct gaaacctagt ctgtacttta 93 960 

gccactgccc ctccaccccc aaaaggatag gagtgaaggg atgaagacca cctttttcta 94 02 0 

atttgcacaa agatgcattt ggcctaacaa aatgggcaag aattatccca aatctccttc 94080 

cacttttgca gttatattca tatctttctt cataatttag ctataccatg gcacttttaa 94140 

actcgatctg tagtaggaag gtgaggctaa atgttatggt cctttgcatt ttgatccgta 94200 

agcaaacagt tgttgtttat tttagaaaaa tggtttccag gtgtaactgc caactgctga 94260 

aaacttaggg ttatgtgagg tgaggcatgt tgatgcttta gtttatttgg agatggggga 9432 0 

agcaggaaaa acagcaaacc attgcagtat ctggaattga tatggatctt tgtgtttaag 943 80 

acagggaact gaagcctggc tgtaccatac atactttaaa catttatgct tatgtaactg 94440 

ctaatcgaat tttgaaaaac tatataactt ttcacacttt ttacaaggat gtttaggttt 94500 

aatgagttga aaagatatac attctagaat attgtaaata tgacatttta aataaaaatt 94560 

gttacaccac tcttttaaat gtattaaatg gggccggatg cagtggctca cacctgtaat 94620 

cccagcactg tgggaggctg aggcagggag attgcttgag gccaggagtt tgaaaccagc 94680 

ctggacaaca tagtgagatc ccacctctac agaaaatttt aaaattaatg tattaaatga 9474 0 

aatattagca ccaaagtgat ttgatattca ccatcatcca atggaaaaaa agaaaaacac 94800 

tgccaagctt ttctttaaaa gaaaacccca aggacaacca gcagaaggat tttacatctt 94860 

cattttacat tgctcctttc tctcttgaaa atgtatttcc atcccattcc cgcaaataat 94920 

tttatctagt gtaatatatt tttaacgctt aaaagccttt cgttgatcat tcattatgtc 94980 

tctgcaacaa aaatattaat ataaattaat aattctgtgg tcttcaattc ctacagtctt 95040 

aaggctctaa atgttcaaga ttctttcaat ttagttattt ttacaagtct ttttattgtt 95100 

accatgatcc atacacaatc aaaataaata aattttatca ttttgtaaat cattgttaaa 95160 

caaaatttta ttggaaagta tcattttaat gagagagggt atttcagagc ctttgttaaa 95220 

gaaggctctg caggcatcag cttgaatttc ctttacttgg gaaggtgggt tttttatatg 952 8 0 

tctcagggca ctgcataata ttaaaataaa ggatgggccg gtgcagtggc tcacacctgt 95340 

aatcccagca atttgggaag tcgaggtgga aaagcgcttg agcccaggag ttcgagacca 95400 

gcctgggcaa tacagtgaga ccaccatctc tacgaaaaat aaaataacta actgggtgtg 95460 

gtgacacacg cctctagtcc cagccattca ggaagctgag gtggaagaat cacttgagcc 95520 

gggaggtgca gtgagctgtc atcagccacc gaacttcagc ctgggcgaca gagtgagatg 95580 

ctgtctcaaa aaatatgtat atactatata tatatacaca cacacatgca aacacatata 95640 

tatacacaca cacatcttat atatatacat cacatatacg tatttgcgta catatacaca 95700 

tatatagaca cacccatata tacatatata gacacacata tatgatgtat atatgtacac 95760 

acacacgtgt atatatacac atatacacac atatacacat acacacatgt atatacacac 95820 

tatatatgtt tacatagcat atatgtatat atcatatatg catacatata tatgatgtgt 95880 

gtgcatgcat atgtagggta tgtatatata gaatatgtat ggggatatat atatatgatg 95940 

gggggtgaaa gattttggta aagcaggaga agggcaatta tgaaatgaga aatagaaaaa 96000 

gagccagctt aatgccttaa ttgcagggac tttctgtctc agaccaatgt tcagaaaaga 96060 

gtacaaatgg aggttgatgg tccccacctg aagaccccag gcagggtcct ca.ccta.cccc 9612 0 

tagggttgtg cataccccaa ctggaagacc actggcccat gtaatattag gtgagatcct 96180 

ttatctagaa atggagagta ataaaaccca ccttgcagag ttgtgaggac taaacaagag 9624 0 

aatctctgtc cacagcttgc ttgtattatg ctgtgtaaac acagggtaaa tggacattgc 96300 

tgtctgagtt gggcatttat tgttattgct attcttattg gtggtaaaca tgttatgaat 96360 

aattaagata agggatgagg aatatttgtt gcaagttctc aatgtacctt tattctaacg 9642 0 

gtagagttgt aattgtctgt tttcttgtct gtctctattc ccggacttgt tggctccttg 96480 

ggttgggatt gtcagagttg tcattgtatt cccagaagtt aacagagggc ctgactacag 96540 

gaagtgctca gtaaatgttt gttgactgaa ttaatgtgat ttctcctatt agtgtctatt 96600 

taacattaaa acgagaaaca gcagtcatct aaaagaggta gaagccacta ggccaaacct 96660 

atcccttcag aaaaatattc cccttttgac tgatctggtt cttttcagag acccatacta 96720 

agagaaagaa ccaattcttg ccacttattt ctctttgtca aaggaaaatg ggtttcataa 96780 

ttgtttttgt ttgcactact gccaacatgg gccattgcaa agctcaggtt gagtgtttac 96840 

atagacgtaa ggtatacttt agccttggga gcactataaa gacatgttgt tgtcttgata 96900 

aaaagaaaga aagggccagg tacggtggct catgcctgta atcccagcac ttcgggaggc 96960 

ctaggcaggt ggagaatgag gtcaggagat caagaccatc ctggccaaca tggtaaaacc 97020 

ccgtctctag aaaaataaaa aaattagctg gcgtggtggc acacacctgt agtctcagct 97080 

actcaggagg ctgaggcagg agaattgctt gaacccggag gcggaggttg cagtgagcca 97140 

agatcgcacc actgcattcc agcctggcga tagtgcaaga ctccatcaaa aaaaaagaag 972 00 

aaagggagga aaaaagaaag aaagagagac agagagagaa agaaaagaaa gaaaagaaaa 97260 

gaaaaggctg ggcatggtgg ctcatgcctg tagtcccaga tactcagaag gctgaggcag 97320 
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gaggattact tgagccgggg aggtagaggc tgcagtgaac tatgatgacg tcactgcact 97380 

tcggcctgga cgacagcaag accctatctc aaaaaaaaaa aaaagaaaga aagaaaatta 97440 

acaagcaaag gaagaattct tttttaaaag tttgagagtt aatactctaa tgcgtaacta 97500 

tgcttatctt aagtttagtt agtcaaattt tatcgaatcg aaactgaagc tgttaggttt 97560 

ctgcatgtgt aaaacctggc tcctaaagaa ctccagattt ccttccagtt ctaaaattaa 97 62 0 

gtttatgcat caatttacgt ttatgcatag cacatgcatg cc 97662 

<210> 2 

<211> 6782 

<212> DNA 

<213> Homo sapiens 



<220> 

<221> 5'UTR 
<222> 1. .112 



<220> 
<221> CDS 
<222> 113 . .6547 



<220> 

<221> 3'UTR 
<222> 6548 . . 6782 



<220> 

<221> allele 
<222> 178 

<223> 5-382-162 : polymorphic base C or T 



<220> 

<221> allele 
<222> 2677 

<223> 5-383-184 : polymorphic base G or T 



<220> 

<221> allele 
<222> 5193 

<223> 5-370-197 : polymorphic base A or G 



<220> 

<221> allele 
<222> 5243 

<223> 5-370-247 : polymorphic base C or T 



<220> 

<221> allele 
<222> 5673 

<223> 5-373-164 : polymorphic base C or T 



<220> 

<221> allele 
<222> 5731 

<223> 5-373-222 : polymorphic base A or G 



<220> 

<221> allele 
<222> 6011 

<223> 5-375-200 : polymorphic base A or G 



<220> 

<221> allele 
<222> 6162 
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:223> 5-376-266 



polymorphic base A or G 



<220> 

<221> allele 
<222> 6271 

<223> 5-377-227 : polymorphic base A or G 



<400> 2 

ggttgggctc cttggtacca tgtgggaagc gctgtgaaga gttgttgcct tccaagatat 
acccaaattc ccagttccag cccgtgtcat taaaactccg ctggcgtgaa ag atg acg 

Met Thr 
1 

tec tta gec cag cag ctg caa cga etc gec etc cct caa agt gat gee 
Ser Leu Ala Gin Gin Leu Gin Arg Leu Ala Leu Pro Gin Ser Asp Ala 

5 10 15 

age etc tta tcy aga gat gaa gtt get tct ttg tta ttt gac cct aag 
Ser Leu Leu Ser Arg Asp Glu Val Ala Ser Leu Leu Phe Asp Pro Lys 

20 25 30 

gaa gcg gee aca ate gac agg gac ace gec ttc gee att gga tgt act 
Glu Ala Ala Thr He Asp Arg Asp Thr Ala Phe Ala He Gly Cys Thr 
35 40 45 50 

ggc ctg gaa gag ttg ctt gga att gat cct tec ttt gag cag ttt gaa 
Gly Leu Glu Glu Leu Leu Gly He Asp Pro Ser Phe Glu Gin Phe Glu 

55 60 65 

gca ccg ttg ttc agt cag eta gca aaa acc ttg gag cga agt gtt cag 
Ala Pro Leu Phe Ser Gin Leu Ala Lys Thr Leu Glu Arg Ser Val Gin 

70 75 80 

acc aaa gca gta aac aaa cag ttg gat gaa aac att tea tta ttc ctt 
Thr Lys Ala Val Asn Lys Gin Leu Asp Glu Asn He Ser Leu Phe Leu 

85 90 95 

att cac ttg teg cct tac ttc ctg ctt aag cca gca cag aag tgt ctg 
He His Leu Ser Pro Tyr Phe Leu Leu Lys Pro Ala Gin Lys Cys Leu 

100 105 HO 

gag tgg ttg att cac agg ttc cat ata cat etc tat aat caa gat age 
Glu Trp Leu He His Arg Phe His He His Leu Tyr Asn Gin Asp Ser 
115 120 125 130 

etc att get tgt gtt ctg cca tac cac gag aca aga ata ttt gtg cga 
Leu He Ala Cys Val Leu Pro Tyr His Glu Thr Arg He Phe Val Arg 

135 140 145 

gtc ata cag ctt eta aaa att aat aat tea aag cac aga tgg ttc tgg 
Val He Gin Leu Leu Lys He Asn Asn Ser Lys His Arg Trp Phe Trp 

150 155 160 

ttg ttg cca gtt aag caa tct gga gtg ccg tta get aaa gga act ttg 
Leu Leu Pro Val Lys Gin Ser Gly Val Pro Leu Ala Lys Gly Thr Leu 

165 170 175 

att acc cac tgc tac aaa gat ctt gga ttc atg gat ttc att tgc agt 
He Thr His Cys Tyr Lys Asp Leu Gly Phe Met Asp Phe He Cys Ser 

180 185 190 

ttg gtg aca aaa tct gtg aag gtt ttt get gag tac ccg ggc age tea 
Leu Val Thr Lys Ser Val Lys Val Phe Ala Glu Tyr Pro Gly Ser Ser 



195 200 



205 210 



get cag ttg agg gtg etc ttg get ttc tat get tct acc ata gtg teg 
Ala Gin Leu Arg Val Leu Leu Ala Phe Tyr Ala Ser Thr He Val Ser 

215 220 225 

gcg ctg gta get gca gag gac gta tea gac aat ate ate gec aaa eta 
Ala Leu Val Ala Ala Glu Asp Val Ser Asp Asn He He Ala Lys Leu 

230 235 240 

ttt ccc tat ate caa aag gga ttg aaa tea tct tta cca gat tac aga 
Phe Pro Tyr He Gin Lys Gly Leu Lys Ser Ser Leu Pro Asp Tyr Arg 

245 250 255 

get gca aca tac atg ata ata tgt cag att tct gtg aaa gtg acc atg 
Ala Ala Thr Tyr Met He He Cys Gin He Ser Val Lys Val Thr Met 
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260 265 270 

gaa aat acc ttt gtg aat tea ttg gca tea cag ate ate aaa aca ttg 
Glu Asn Thr Phe Val Asn Ser Leu Ala Ser Gin lie lie Lys Thr Leu 
275 280 285 290 

acc aag att ccc tct ttg ate aag gat ggg tta agt tgc ttg ata gtg 
Thr Lys lie Pro Ser Leu lie Lys Asp Gly Leu Ser Cys Leu lie Val 

295 300 305 

etc ctg cag aga cag aag cca gag age ctt ggg aaa aag cca ttc cct 
Leu Leu Gin Arg Gin Lys Pro Glu Ser Leu Gly Lys Lys Pro Phe Pro 

310 315 320 

cac tta tgt aat gtt cct gat ctt att aca ata ctt cat ggg att tct 
His Leu Cys Asn Val Pro Asp Leu lie Thr lie Leu His Gly lie Ser 

325 330 335 

gaa act tac gat gtc agt cct ctt ctg cgt tac atg ctt ccc cat ctg 
Glu Thr Tyr Asp Val Ser Pro Leu Leu Arg Tyr Met Leu Pro His Leu 

340 345 350 

gtc gtc tec ate att cat cat gtt aca gga gaa gaa act gaa gga atg 
Val Val Ser He He His His Val Thr Gly Glu Glu Thr Glu Gly Met 
355 360 365 370 

gat ggt caa ate tac aag aga cac tta gaa get ata ctt aca aaa ata 
Asp Gly Gin He Tyr Lys Arg His Leu Glu Ala He Leu Thr Lys He 

375 380 385 

tea ctg aag aac aac tta gac cat ttg ttg get age ctt eta ttt gaa 
Ser Leu Lys Asn Asn Leu Asp His Leu Leu Ala Ser Leu Leu Phe Glu 

390 395 400 

gag tat att tea tat agt tea cag gaa gaa atg gat tct aat aaa gtg 
Glu Tyr He Ser Tyr Ser Ser Gin Glu Glu Met Asp Ser Asn Lys Val 

405 410 415 

tct ttg ctt aat gaa caa ttt ctt cca etc att aga ctt tta gaa age 
Ser Leu Leu Asn Glu Gin Phe Leu Pro Leu He Arg Leu Leu Glu Ser 

420 425 430 

aaa tac ccc aga aca tta gat gtt gta tta gag gaa cac tta aag gaa 
Lys Tyr Pro Arg Thr Leu Asp Val Val Leu Glu Glu His Leu Lys Glu 
435 440 445 450 

att gca gat ctg aaa aaa caa gag ctt ttc cat cag ttt gtt tct ctt 
He Ala Asp Leu Lys Lys Gin Glu Leu Phe His Gin Phe Val Ser Leu 

455 460 465 

tct aca agt gga gga aag tat cag ttt tta gca gat tct gat act tct 
Ser Thr Ser Gly Gly Lys Tyr Gin Phe Leu Ala Asp Ser Asp Thr Ser 

470 475 480 

ttg atg etc age ctg aat cat cca ctt get cct gtg aga att ctg gec 
Leu Met Leu Ser Leu Asn His Pro Leu Ala Pro Val Arg He Leu Ala 

485 490 495 

atg aat cat ttg aaa aag ate atg aaa aca tea aag gag ggt gtt gat 
Met Asn His Leu Lys Lys He Met Lys Thr Ser Lys Glu Gly Val Asp 

500 505 510 

gaa tct ttc ata aaa gaa get gtt tta gee cga tta ggt gat gat aat 
Glu Ser Phe He Lys Glu Ala Val Leu Ala Arg Leu Gly Asp Asp Asn 
515 520 525 530 

ata gat gtt gtt ttg teg get ata agt get ttt gag att ttc aaa gaa 
He Asp Val Val Leu Ser Ala He Ser Ala Phe Glu He Phe Lys Glu 

535 540 545 

cac ttc agt tea gaa gtg acg att tea aat ctt ctg aat etc ttt caa 
His Phe Ser Ser Glu Val Thr He Ser Asn Leu Leu Asn Leu Phe Gin 

550 555 560 

aga gca gaa ctt tea aag aat gga gaa tgg tac gag gta ctt aag ata 
Arg Ala Glu Leu Ser Lys Asn Gly Glu Trp Tyr Glu Val Leu Lys He 

565 570 575 

gec get gac ata tta att aaa gaa gag ata ctg agt gaa aat gat cag 
Ala Ala Asp He Leu He Lys Glu Glu He Leu Ser Glu Asn Asp Gin 

580 585 590 

ttg tea aat cag gtg gtt gta tgt ttg ctg cca ttt gtg gtt ate aat 
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Leu Ser Asn Gin Val Val Val Cys Leu Leu Pro Phe Val Val He Asn 
595 600 605 610 

aat gat gat acg gaa tct get gag atg aaa att get ata tat tta tea 
Asn Asp Asp Thr Glu Ser Ala Glu Met Lys He Ala He Tyr Leu Ser 

615 620 625 

aaa tea gga ate tgc tec ctg cac cct eta tta aga ggc tgg gaa gaa 
Lys Ser Gly He Cys Ser Leu His Pro Leu Leu Arg Gly Trp Glu Glu 

630 635 640 

get ctt gaa aat gta att aaa age aca aag cca gga aaa eta ate ggt 
Ala Leu Glu Asn Val He Lys Ser Thr Lys Pro Gly Lys Leu He Gly 

645 650 655 

gta gca aat cag aag atg att gag ttg ttg get gat aat ata aat tta 
Val Ala Asn Gin Lys Met He Glu Leu Leu Ala Asp Asn He Asn Leu 

660 665 670 

gga gat cct tct tea atg tta aag atg gtg gag gat ttg ata age gtg 
Gly Asp Pro Ser Ser Met Leu Lys Met Val Glu Asp Leu He Ser Val 
675 680 685 690 

ggt gag gag gag tec ttt aac ctg aag cag aaa gta acg ttt cat gtg 
Gly Glu Glu Glu Ser Phe Asn Leu Lys Gin Lys Val Thr Phe His Val 

695 700 705 

ate ctg tct gtg etc gtc tct tgt tgt tea tct tta aaa gaa ace cac 
He Leu Ser Val Leu Val Ser Cys Cys Ser Ser Leu Lys Glu Thr His 

710 715 720 

ttt cca ttt gcg ata aga gtc ttc agt ttg ttg cag aaa aaa ata aag 
Phe Pro Phe Ala He Arg Val Phe Ser Leu Leu Gin Lys Lys He Lys 

725 730 735 

aag ctt gaa agt gtc att act gca gtg gaa ate ccc tea gaa tgg cac 
Lys Leu Glu Ser Val He Thr Ala Val Glu He Pro Ser Glu Trp His 

740 745 750 

att gaa ctg atg tta gac aga ggg ate cca gta gag ctg tgg gca cat 
He Glu Leu Met Leu Asp Arg Gly He Pro Val Glu Leu Trp Ala His 
755 760 765 770 

tat gta gaa gag etc aac age act cag agg gtg gee gtg gag gac teg 
Tyr Val Glu Glu Leu Asn Ser Thr Gin Arg Val Ala Val Glu Asp Ser 

775 780 785 

gtt ttt ctt gta ttt tec ttg aaa aaa ttt att tat gca ctg aaa get 
Val Phe Leu Val Phe Ser Leu Lys Lys Phe lie Tyr Ala Leu Lys Ala 

790 795 800 

cct aaa tct ttt cct aaa ggt gat ata tgg tgg aat cct gaa caa ctg 
Pro Lys Ser Phe Pro Lys Gly Asp He Trp Trp Asn Pro Glu Gin Leu 

805 810 815 

aaa gaa gac age agg gac tat ctg cac ttg etc att ggg ctg ttt gag 
Lys Glu Asp Ser Arg Asp Tyr Leu His Leu Leu He Gly Leu Phe Glu 

820 825 830 

atg atg etc aat ggt gee gat get gtt cat ttc aga gtt ctg atg aaa 
Met Met Leu Asn Gly Ala Asp Ala Val His Phe Arg Val Leu Met Lys 
835 840 845 850 

ctt ttc ata aag gtk cat eta gaa gat gtt ttt cag tta ttc aag ttc 
Leu Phe He Lys Val His Leu Glu Asp Val Phe Gin Leu Phe Lys Phe 

855 860 865 

tgt tct gtt tta tgg ace tat ggt tct age ctt tea aat cca eta aac 
Cys Ser Val Leu Trp Thr Tyr Gly Ser Ser Leu Ser Asn Pro Leu Asn 

870 875 880 

tgc agt gtg aaa aca gtg ctg cag act caa get ctt tat gtg ggc tgt 
Cys Ser Val Lys Thr Val Leu Gin Thr Gin Ala Leu Tyr Val Gly Cys 

885 890 895 

gca atg ctt tct tct cag aag aca cag tgt aaa cac caa ctg gca tec 
Ala Met Leu Ser Ser Gin Lys Thr Gin Cys Lys His Gin Leu Ala Ser 

900 905 910 

ata tct tct cca gtg gtg aca tct tta etc att aac ctg gga age ccc 
He Ser Ser Pro Val Val Thr Ser Leu Leu He Asn Leu Gly Ser Pro 
915 920 925 930 
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gta aaa gaa gtt cgt agg get gec att cag tgt etc cag gee etc agt 
Val Lys Glu Val Arg Arg Ala Ala He Gin Cys Leu Gin Ala Leu Ser 

935 940 945 

gga gtg gca tec ccg ttt tat ctg ata ata gat cat ttg att tct aaa 
Gly Val Ala Ser Pro Phe Tyr Leu He He Asp His Leu lie Ser Lys 

950 955 960 

gca gag gag ate act tea gat get gec tat gtt att cag gat ttg get 
Ala Glu Glu He Thr Ser Asp Ala Ala Tyr Val He Gin Asp Leu Ala 

965 970 975 

act tta ttt gag gaa eta cag aga gaa aag aaa ctg aaa tct eat cag 
Thr Leu Phe Glu Glu Leu Gin Arg Glu Lys Lys Leu Lys Ser Hxs Gin 

980 985 990 

aag ttg tct gaa act ttg aaa aac tta ctt agt tgt gtg tat agt tgc 
Lys Leu Ser Glu Thr Leu Lys Asn Leu Leu Ser Cys Val Tyr Ser Cys 
995 1000 1005 1010 

cca tct tat ata gca aaa gat ttg atg aaa gta ett eag gga gtc aac 
Pro Ser Tyr He Ala Lys Asp Leu Met Lys Val Leu Gin Gly Val Asn 

1015 1020 1025 

ggt gag atg gtg ctt tct cag eta ttg cct atg get gaa caa ctg eta 
Gly Glu Met Val Leu Ser Gin Leu Leu Pro Met Ala Glu Gin Leu Leu 

1030 1035 1040 

gaa aag ate cag aag gag ccc aca get gtg ctg aaa gat gag gee atg 
Glu Lys He Gin Lys Glu Pro Thr Ala Val Leu Lys Asp Glu Ala Met 

1045 1050 IO 55 

gtt ctg cat etc act ctg gga aag tat aat gaa ttt tea gtt tec ctt 
Val Leu His Leu Thr Leu Gly Lys Tyr Asn Glu Phe Ser Val Ser Leu 

1060 1065 1070 

tta aat gag gat ccg aag agt eta gat ata ttt ata aaa get gtg cac 
Leu Asn Glu Asp Pro Lys Ser Leu Asp He Phe He Lys Ala Val His 
1075 1080 1085 1090 

aca aca aag gaa ctt tac gcg gga atg cca acc att cag ate aca gec 
Thr Thr Lys Glu Leu Tyr Ala Gly Met Pro Thr He Gin He Thr Ala 

1095 HOG H05 

ctt aaa aag att aca aaa cca ttt ttt gca gec ata tea gat gaa aaa 
Leu Glu Lys He Thr Lys Pro Phe Phe Ala Ala He Ser Asp Glu Lys 

1110 1H5 1120 

gtt cag cag aag ctt tta aga atg ttg ttt gat tta ttg gtg aac tgt 
Val Gin Gin Lys Leu Leu Arg Met Leu Phe Asp Leu Leu Val Asn Cys 

112 5 1130 H35 

aaa aac tea cat tgt get cag act gtc age agt gtt ttt aaa ggg att 
Lys Asn Ser His Cys Ala Gin Thr Val Ser Ser Val Phe Lys Gly He 

1140 H45 1150 

tec gtt aat get gaa caa gtc cga ata gaa ctg gag cca cca gat aaa 
Ser Val Asn Ala Glu Gin Val Arg He Glu Leu Glu Pro Pro Asp Lys 
1155 1160 H65 H70 

act aaa ccc ttg ggc aca gtt cag caa aaa aga agg caa aaa atg cag 
Ala Lys Pro Leu Gly Thr Val Gin Gin Lys Arg Arg Gin Lys Met Gin 

1175 1180 H85 

cag aaa aaa tea caa gat eta gaa tct gtt cag gaa gtt gga ggt tct 
Gin Lys Lys Ser Gin Asp Leu Glu Ser Val Gin Glu Val Gly Gly Ser 

H90 H95 1200 

tac tgg caa aga gta act etc ate ctg gaa tta ctg cag cac aaa aag 
Tyr Trp Gin Arg Val Thr Leu He Leu Glu Leu Leu Gin His Lys Lys 

1205 1210 1215 

aaq etc aga agt cct cag ata ttg gtg cca act ctt ttt aac ttg eta 
Lys Leu Arg Ser Pro Gin He Leu Val Pro Thr Leu Phe Asn Leu Leu 

1220 1225 1230 

tea aqa tgt tta gaa ccc ttg cca caa gag cag gga aat atg gaa tac 
Ser Arq Cys Leu Glu Pro Leu Pro Gin Glu Gin Gly Asn Met Glu Tyr 
1235 1240 1245 1250 

acc aaa caa tta att ctt agt tgt ctg etc aac ate tgc caa aaa eta 
Thr Lys Gin Leu He Leu Ser Cys Leu Leu Asn He Cys Gin Lys Leu 
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1255 1260 1265 

tct cca gat ggt ggc aaa ata ccc aaa gat att tta gat gag gag aag 
Ser Pro Asp Gly Gly Lys He Pro Lys Asp He Leu Asp Glu Glu Lys 

1270 1275 1280 

ttc aac gtg gag ttg ata gtt cag tgc ate cgc ctt teg gag atg ccg 
Phe Asn Val Glu Leu He Val Gin Cys He Arg Leu Ser Glu Met Pro 

1285 I 290 1295 

cag acc cat cac cat gec ctt tta ctt ttg ggc act gtt get gga ata 
Gin Thr His His His Ala Leu Leu Leu Leu Gly Thr Val Ala Gly He 

1300 1305 1310 

ttt ccg gat aaa gtt tta cac aat ate atg tct att ttt aca ttt atg 
Phe Pro Asp Lys Val Leu His Asn He Met Ser He Phe Thr Phe Met 
1315 1320 1325 1330 

gga gec aat gtc atg cgc eta gat gat act tac agt ttt caa gtt att 
Gly Ala Asn Val Met Arg Leu Asp Asp Thr Tyr Ser Phe Gin Val He 

1335 1340 1345 

aac aag aca gtg aaa atg gtt att ccc gca ctt att cag tct gat agt 
Asn Lys Thr Val Lys Met Val He Pro Ala Leu He Gin Ser Asp Ser 

1350 1355 1360 

gga gat tct ata gaa gtt tea aga aac gtt gaa gag att gtg gta aaa 
Gly Asp Ser He Glu Val Ser Arg Asn Val Glu Glu He Val Val Lys 

1365 1370 1375 

ate att agt gta ttt gtg gat gcg ctg cca cac gtc ccg gag cac agg 
He He Ser Val Phe Val Asp Ala Leu Pro His Val Pro Glu His Arg 

1380 1385 1390 

cgc ctg ccc ate ctt gtt caa ctt gtt gat aca ctg ggt gca gag aaa 
Arg Leu Pro He Leu Val Gin Leu Val Asp Thr Leu Gly Ala Glu Lys 
1395 1400 1405 1410 

ttc etc tgg att etc etc ate ttg ctt ttt gaa cag tat gtc aca aaa 
Phe Leu Trp He Leu Leu He Leu Leu Phe Glu Gin Tyr Val Thr Lys 

1415 1420 1425 

aca gtg ctg gcg get gec tat ggc gaa aag gat get att tta gaa gca 
Thr Val Leu Ala Ala Ala Tyr Gly Glu Lys Asp Ala He Leu Glu Ala 

1430 1435 1440 

gac act gaa ttt tgg ttt tea gtc tgt tgt gag ttt agt gtc cag cat 
Asp Thr Glu Phe Trp Phe Ser Val Cys Cys Glu Phe Ser Val Gin His 

1445 1450 1455 

cag ata caa age ttg atg aat ate etc cag tac tta eta aag ctg cca 
Gin He Gin Ser Leu Met Asn He Leu Gin Tyr Leu Leu Lys Leu Pro 

1460 1465 1470 

gag gaa aaa gaa gaa acc att ccc aaa gca gtg tea ttt aat aag agt 
Glu Glu Lys Glu Glu Thr He Pro Lys Ala Val Ser Phe Asn Lys Ser 
1475 1480 1485 1490 

gaa tea caa gaa gaa atg eta cag gtt ttt aat gta gag act cac act 
Glu Ser Gin Glu Glu Met Leu Gin Val Phe Asn Val Glu Thr His Thr 

1495 1500 1505 

age aag caa ctg egg cat ttt aaa ttt ttg tea gtg tec ttc atg tct 
Ser Lys Gin Leu Arg His Phe Lys Phe Leu Ser Val Ser Phe Met Ser 

1510 1515 1520 

cag etc ctg tct tec aat aat ttt ctg aaa aag gta gtt gag agt ggt 
Gin Leu Leu Ser Ser Asn Asn Phe Leu Lys Lys Val Val Glu Ser Gly 

1525 1530 1535 

ggt cct gag att tta aaa ggc ctt gaa gag agg ttg ctg gag acc gtt 
Gly Pro Glu He Leu Lys Gly Leu Glu Glu Arg Leu Leu Glu Thr Val 

1540 1545 1550 

etc ggc tat ate agt gca gtt gca cag tec atg gaa agg aac gca gac 
Leu Gly Tyr He Ser Ala Val Ala Gin Ser Met Glu Arg Asn Ala Asp 
1555 1560 1565 1570 

aaa etc acc gtg aag ttc tgg cgc gcg etc ctt agt aaa get tac gac 
Lys Leu Thr Val Lys Phe Trp Arg Ala Leu Leu Ser Lys Ala Tyr Asp 

1575 1580 1585 

ctg tta gat aag gtc aat gec ttg ctg ccc aca gag aca ttc att cct 
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Leu Leu Asp Lys Val Asn Ala Leu Leu Pro Thr Glu Thr Phe lie Pro 

1590 1595 1600 

gtg ate aga ggg ctg gtg ggc aat ccc ctg cca tct gtt cgc cgc aaa 
Val He Arg Gly Leu Val Gly Asn Pro Leu Pro Ser Val Arg Arg Lys 

1605 1610 1615 

gcg ctg gac ctt ttg aat aac aag ctg cag caa aat ata tec tgg aag 
Ala Leu Asp Leu Leu Asn Asn Lys Leu Gin Gin Asn He Ser Trp Lys 

1620 1625 1630 

aag aca ata gtt acc cgt ttc eta aaa ctg gtt cca gac ctt ttg gec 
Lys Thr He Val Thr Arg Phe Leu Lys Leu Val Pro Asp Leu Leu Ala 
1635 1640 1645 1650 

att gtg cag cgt aag aaa aag gaa ggg gaa gaa gaa caa gca ate aac 
He Val Gin Arg Lys Lys Lys Glu Gly Glu Glu Glu Gin Ala He Asn 

1655 1660 1665 

aga cag aca gcg ttg tat acc tta aag ctt tta tgc aag aat ttt ggt 
Arg Gin Thr Ala Leu Tyr Thr Leu Lys Leu Leu Cys Lys Asn Phe Gly 

1670 1675 1680 

gca gaa aat cca gat cct ttt gtc cca gtg ctg arc act get gtg aaa 
Ala Glu Asn Pro Asp Pro Phe Val Pro Val Leu Xaa Thr Ala Val Lys 

1685 1690 1695 

ctg att get cca gag aga aag gag gag aag aat gtc ytg gga age gcg 
Leu He Ala Pro Glu Arg Lys Glu Glu Lys Asn Val Leu Gly Ser Ala 

1700 1705 1710 

ctg ctg tgc ata gca gag gtg acc tec acc ctg gag gcg ctg gee ate 
Leu Leu Cys He Ala Glu Val Thr Ser Thr Leu Glu Ala Leu Ala He 
1715 1720 1725 1730 

ccc cag ctt ccc age ctg atg cca teg ttg ctg aca aca atg aag aac 
Pro Gin Leu Pro Ser Leu Met Pro Ser Leu Leu Thr Thr Met Lys Asn 

1735 1740 1745 

acc age gag ctg gtc tec age gag gtc tac ctg etc agt gee ttg get 
Thr Ser Glu Leu Val Ser Ser Glu Val Tyr Leu Leu Ser Ala Leu Ala 

1750 1755 1760 

get ctg cag aag gtt gtg gag act etc ccg cac ttc ate age ccc tat 
Ala Leu Gin Lys Val Val Glu Thr Leu Pro His Phe He Ser Pro Tyr 

1765 1770 1775 

ctg gaa ggc att etc tec cag gtg att cat ctg gag aaa ate act agt 
Leu Glu Gly He Leu Ser Gin Val He His Leu Glu Lys He Thr Ser 

1780 1785 1790 

gaa atg ggt tct gcg tea cag get aat ate cgt etc aca tct ctt aaa 
Glu Met Gly Ser Ala Ser Gin Ala Asn He Arg Leu Thr Ser Leu Lys 
1795 1800 1805 1810 

aag aca ctg get acc aca ctt gca ccc cga gtc ctg ttg ccc gec ate 
Lys Thr Leu Ala Thr Thr Leu Ala Pro Arg Val Leu Leu Pro Ala He 

1815 1820 1825 

aaa aaa act tac aag cag att gag aag aac tgg aag aat cac atg ggt 
Lys Lys Thr Tyr Lys Gin He Glu Lys Asn Trp Lys Asn His Met Gly 

1830 1835 1840 

ccg ttt atg age ate ttg caa gag cat att ggg gyg atg aag aag gaa 
Pro Phe Met Ser He Leu Gin Glu His He Gly Xaa Met Lys Lys Glu 

1845 1850 1855 

gag etc acc tec cat cag tct cag eta acc gec ttt ttc ctg gar gec 
Glu Leu Thr Ser His Gin Ser Gin Leu Thr Ala Phe Phe Leu Glu Ala 

1860 1865 1870 

ctg gac ttc cga gee cag cac tct gag aac gat ctg gag gaa gtt gga 
Leu Asp Phe Arg Ala Gin His Ser Glu Asn Asp Leu Glu Glu Val Gly 
1875 1880 1885 1890 

aaa acg gaa aat tgt ate att gac tgt eta gta gee atg gtt gtc aaa 
Lys Thr Glu Asn Cys He He Asp Cys Leu Val Ala Met Val Val Lys 

1895 1900 1905 

ctt tec gag gtc aca ttc agg ccc ctg ttc ttc aag ctg ttt gat tgg 
Leu Ser Glu Val Thr Phe Arg Pro Leu Phe Phe Lys Leu Phe Asp Trp 
1910 1915 1920 
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get aaa aca gaa gat gec cca aag gac agg ttg ttg aca ttt tac aac 
Ala Lys Thr Glu Asp Ala Pro Lys Asp Arg Leu Leu Thr Phe Tyr Asn 

1925 1930 1935 

ttg gca gat tgc att get gaa aag ctg aaa ggg ctt ttt act ctg ttt 
Leu Ala Asp Cys He Ala Glu Lys Leu Lys Gly Leu Phe Thr Leu Phe 

1940 1945 1950 

gec ggc cac tta gtg aag cct ttt get gac acc ttg rac cag gtg aac 
Ala Gly His Leu Val Lys Pro Phe Ala Asp Thr Leu Xaa Gin Val Asn 
1955 I960 1965 1970 

ate tec aaa aca gat gaa gca ttt ttt gac tct gaa aat gac cct gaa 
He Ser Lys Thr Asp Glu Ala Phe Phe Asp Ser Glu Asn Asp Pro Glu 

1975 1980 1985 

aag tgc tgc ttg ctg ttg cag ttt att ttg aac tgt tta tac aaa ate 
Lys Cys Cys Leu Leu Leu Gin Phe He Leu Asn Cys Leu Tyr Lys He 

1990 1995 2000 

ttc ctt ttt gat acc cag cat ttt ata agt aaa gag aga gca gra gec 
Phe Leu Phe Asp Thr Gin His Phe He Ser Lys Glu Arg Ala Xaa Ala 

2005 2010 2015 

ttg atg atg cct ctg gtg gat cag ctg gaa aac agg ctt ggg gga gaa 
Leu Met Met Pro Leu Val Asp Gin Leu Glu Asn Arg Leu Gly Gly Glu 

2020 2Q25 2030 

gag aaa ttc cag gaa egg gtg aca aag cac ctg ata cca tgc ate gca 
Glu Lys Phe Gin Glu Arg Val Thr Lys His Leu He Pro Cys He Ala 
2035 2040 2045 2050 

cag ttt tcr gtg gec atg gcg gat gac tct ctt tgg aaa cca ctg aac 
Gin Phe Ser Val Ala Met Ala Asp Asp Ser Leu Trp Lys Pro Leu Asn 

2055 2060 2065 

tac cag att ctg eta aag acg aga gac tec teg cct aag gtt cga ttt 
Tyr Gin He Leu Leu Lys Thr Arg Asp Ser Ser Pro Lys Val Arg Phe 

2070 2075 2080 

get get ttg att act gtg tta gca ctg get gaa aaa eta aag gag aat 
Ala Ala Leu He Thr Val Leu Ala Leu Ala Glu Lys Leu Lys Glu Asn 

2085 2090 2095 

tat att gtc ttg eta cca gaa tec att cct ttc tta gca gag ttg atg 
Tyr He Val Leu Leu Pro Glu Ser He Pro Phe Leu Ala Glu Leu Met 

2100 2105 2110 

gaa gat gaa tgt gaa gaa gta gaa cat cag tgc caa aag act att cag 
Glu Asp Glu Cys Glu Glu Val Glu His Gin Cys Gin Lys Thr He Gin 
2115 2120 2125 2130 

caa ctg gaa act gtc ctg gga gag cca etc cag age tat ttc taa 
Gin Leu Glu Thr Val Leu Gly Glu Pro Leu Gin Ser Tyr Phe * 

2135 2140 2145 

gactttctgt ggtgtttcat actctactca gagttcacac tcatatttca tatttttatt 
tttgggtgtt gggtgccatg ttacttttgg tgecttaata cacctacttg gattacttac 
aaatgtttta teacttegtt acaaaatccc cacctggctt gtgctgccac ataagectet 
cctgcctatc gtatagagct gcagaaagag taaatgatac aeggtatttt tatac 

<210> 3 

<211> 7932 

<212> DNA 

<213> Homo sapiens 

<220> 

<221> 5'UTR 
<222> 1 . . 112 

<220> 
<221> CDS 
<222> 113 . . 6547 

<220> 

<221> 3'UTR 



6607 
6667 
6727 
6782 
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<222> 6548 . .7932 
<220> 

<221> allele 
<222> 178 

<223> 5-382-162 : polymorphic base C or T 
<220> 

<221> allele 
<222> 2677 

<223> 5-383-184 : polymorphic base G or T 
<220> 

<221> allele 
<222> 5193 

<223> 5-370-197 : polymorphic base A or G 
<220> 

<221> allele 
<222> 5243 

<223> 5-370-247 : polymorphic base C or T 
<220> 

<221> allele 
<222> 5673 

<223> 5-373-164 : polymorphic base C or T 
<220> 

<221> allele 
<222> 5731 

<223> 5-373-222 : polymorphic base A or G 
<220> 

<221> allele 
<222> 6011 

<223> 5-375-200 : polymorphic base A or G 
<220> 

<221> allele 
<222> 6162 

<223> 5-376-266 : polymorphic base A or G 
<220> 

<221> allele 
<222> 6271 

<223> 5-377-227 : polymorphic base A or G 
<220> 

<221> allele 
<222> 7343 

<223> 5-403-156 : polymorphic base C or T 
<400> 3 

ggttgggctc cttggtacca tgtgggaagc gctgtgaaga gttgttgcct tccaagatat 
acccaaattc ccagttccag cccgtgtcat taaaactccg ctggcgtgaa ag atg acg 

Met Thr 
1 

tec tta gec cag cag ctg caa cga etc gec etc cct caa agt gat gec 
Ser Leu Ala Gin Gin Leu Gin Arg Leu Ala Leu Pro Gin Ser Asp Ala 

5 10 15 

age etc tta tcy aga gat gaa gtt get tct ttg tta ttt gac cct aag 
Ser Leu Leu Ser Arg Asp Glu Val Ala Ser Leu Leu Phe Asp Pro Lys 
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gaa gcg gcc aca ate gac agg gae acc gec ttc gec att gga tgt act 

Glu Ala Ala Thr lie Asp Arg Asp Thr Ala Phe Ala He Gly Cys Thr 

35 40 45 50 

ggc ctg gaa gag ttg ctt gga att gat cct tec ttt gag cag ttt gaa 

Gly Leu Glu Glu Leu Leu Gly He Asp Pro Ser Phe Glu Gin Phe Glu 

55 60 65 

gca ccg ttg ttc agt cag eta gca aaa acc ttg gag cga agt gtt cag 

Ala Pro Leu Phe Ser Gin Leu Ala Lys Thr Leu Glu Arg Ser Val Gin 

70 75 80 

acc aaa gca gta aac aaa cag ttg gat gaa aac att tea tta ttc ctt 

Thr Lys Ala Val Asn Lys Gin Leu Asp Glu Asn He Ser Leu Phe Leu 

85 90 95 

att cac ttg teg cct tac ttc ctg ctt aag cca gca cag aag tgt ctg 

He His Leu Ser Pro Tyr Phe Leu Leu Lys Pro Ala Gin Lys Cys Leu 

100 105 110 

gag tgg ttg att cac agg ttc cat ata cat etc tat aat caa gat age 

Glu Trp Leu He His Arg Phe His He His Leu Tyr Asn Gin Asp Ser 

115 120 125 130 

etc att get tgt gtt ctg cca tac cac gag aca aga ata ttt gtg cga 

Leu He Ala Cys Val Leu Pro Tyr His Glu Thr Arg He Phe Val Arg 

135 140 145 

gtc ata cag ctt eta aaa att aat aat tea aag cac aga tgg ttc tgg 

Val He Gin Leu Leu Lys He Asn Asn Ser Lys His Arg Trp Phe Trp 

150 155 160 

ttg ttg cca gtt aag caa tct gga gtg ccg tta get aaa gga act ttg 

Leu Leu Pro Val Lys Gin Ser Gly Val Pro Leu Ala Lys Gly Thr Leu 

165 170 175 

att acc cac tgc tac aaa gat ctt gga ttc atg gat ttc att tgc agt 

He Thr His Cys Tyr Lys Asp Leu Gly Phe Met Asp Phe He Cys Ser 

180 185 190 

ttg gtg aca aaa tct gtg aag gtt ttt get gag tac ccg ggc age tea 

Leu Val Thr Lys Ser Val Lys Val Phe Ala Glu Tyr Pro Gly Ser Ser 

195 200 205 210 

get cag ttg agg gtg etc ttg get ttc tat get tct acc ata gtg teg 

Ala Gin Leu Arg Val Leu Leu Ala Phe Tyr Ala Ser Thr He Val Ser 

215 220 225 

gcg ctg gta get gca gag gac gta tea gac aat ate ate gcc aaa eta 

Ala Leu Val Ala Ala Glu Asp Val Ser Asp Asn He He Ala Lys Leu 

230 235 240 

ttt ccc tat ate caa aag gga ttg aaa tea tct tta cca gat tac aga 

Phe Pro Tyr He Gin Lys Gly Leu Lys Ser Ser Leu Pro Asp Tyr Arg 

245 250 255 

get gca aca tac atg ata ata tgt cag att tct gtg aaa gtg acc atg 

Ala Ala Thr Tyr Met He He Cys Gin He Ser Val Lys Val Thr Met 

260 265 270 

gaa aat acc ttt gtg aat tea ttg gca tea cag ate ate aaa aca ttg 

Glu Asn Thr Phe Val Asn Ser Leu Ala Ser Gin He He Lys Thr Leu 

275 280 285 290 

acc aag att ccc tct ttg ate aag gat ggg tta agt tgc ttg ata gtg 

Thr Lys He Pro Ser Leu He Lys Asp Gly Leu Ser Cys Leu He Val 

295 300 305 

etc ctg cag aga cag aag cca gag age ctt ggg aaa aag cca ttc cct 

Leu Leu Gin Arg Gin Lys Pro Glu Ser Leu Gly Lys Lys Pro Phe Pro 

310 315 320 

cac tta tgt aat gtt cct gat ctt att aca ata ctt cat ggg att tct 

His Leu Cys Asn Val Pro Asp Leu He Thr He Leu His Gly He Ser 

325 330 335 

gaa act tac gat gtc agt cct ctt ctg cgt tac atg ctt ccc cat ctg 

Glu Thr Tyr Asp Val Ser Pro Leu Leu Arg Tyr Met Leu Pro His Leu 

340 345 350 

gtc gtc tec ate att cat cat gtt aca gga gaa gaa act gaa gga atg 
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Val Val Ser He He His His Val Thr Gly Glu Glu Thr Glu Gly Met 
355 360 365 370 

gat ggt caa ate tac aag aga cac tta gaa get ata ctt aca aaa ata 
Asp Gly Gin He Tyr Lys Arg His Leu Glu Ala He Leu Thr Lys He 

375 380 385 

tea ctg aag aac aac tta gac cat ttg ttg get age ctt eta ttt gaa 
Ser Leu Lys Asn Asn Leu Asp His Leu Leu Ala Ser Leu Leu Phe Glu 

390 395 400 

gag tat att tea tat agt tea cag gaa gaa atg gat tct aat aaa gtg 
Glu Tyr He Ser Tyr Ser Ser Gin Glu Glu Met Asp Ser Asn Lys Val 

405 410 415 

tct ttg ctt aat gaa caa ttt ctt cca etc att aga ctt tta gaa age 
Ser Leu Leu Asn Glu Gin Phe Leu Pro Leu He Arg Leu Leu Glu Ser 

420 425 430 

aaa tac ccc aga aca tta gat gtt gta tta gag gaa cac tta aag gaa 
Lys Tyr Pro Arg Thr Leu Asp Val Val Leu Glu Glu His Leu Lys Glu 
435 440 445 450 

att gca gat ctg aaa aaa caa gag ctt ttc cat cag ttt gtt tct ctt 
He Ala Asp Leu Lys Lys Gin Glu Leu Phe His Gin Phe Val Ser Leu 

455 460 465 

tct aca agt gga gga aag tat cag ttt tta gca gat tct gat act tct 
Ser Thr Ser Gly Gly Lys Tyr Gin Phe Leu Ala Asp Ser Asp Thr Ser 

470 475 480 

ttg atg etc age ctg aat cat cca ctt get cct gtg aga att ctg gee 
Leu Met Leu Ser Leu Asn His Pro Leu Ala Pro Val Arg He Leu Ala 

485 490 495 

atg aat cat ttg aaa aag ate atg aaa aca tea aag gag ggt gtt gat 
Met Asn His Leu Lys Lys He Met Lys Thr Ser Lys Glu Gly Val Asp 

500 505 510 

gaa tct ttc ata aaa gaa get gtt tta gee cga tta ggt gat gat aat 
Glu Ser Phe He Lys Glu Ala Val Leu Ala Arg Leu Gly Asp Asp Asn 
515 520 525 530 

ata gat gtt gtt ttg teg get ata agt get ttt gag att ttc aaa gaa 
He Asp Val -Val Leu Ser Ala He Ser Ala Phe Glu He Phe Lys Glu 

535 540 545 

cac ttc agt tea gaa gtg acg att tea aat ctt ctg aat etc ttt caa 
His Phe Ser Ser Glu Val Thr He Ser Asn Leu Leu Asn Leu Phe Gin 

550 555 560 

aga gca gaa ctt tea aag aat gga gaa tgg tac gag gta ctt aag ata 
Arg Ala Glu Leu Ser Lys Asn Gly Glu Trp Tyr Glu Val Leu Lys He 

565 570 575 

gec get gac ata tta att aaa gaa gag ata ctg agt gaa aat gat cag 
Ala Ala Asp He Leu He Lys Glu Glu He Leu Ser Glu Asn Asp Gin 

580 585 590 

ttg tea aat cag gtg gtt gta tgt ttg ctg cca ttt gtg gtt ate aat 
Leu Ser Asn Gin Val Val Val Cys Leu Leu Pro Phe Val Val He Asn 
595 600 605 610 

aat gat gat acg gaa tct get gag atg aaa att get ata tat tta tea 
Asn Asp Asp Thr Glu Ser Ala Glu Met Lys He Ala He Tyr Leu Ser 

615 620 625 

aaa tea gga ate tgc tec ctg cac cct eta tta aga ggc tgg gaa gaa 
Lys Ser Gly He Cys Ser Leu His Pro Leu Leu Arg Gly Trp Glu Glu 

630 635 640 

get ctt gaa aat gta att aaa age aca aag cca gga aaa eta ate ggt 
Ala Leu Glu Asn Val He Lys Ser Thr Lys Pro Gly Lys Leu He Gly 

645 650 655 

gta gca aat cag aag atg att gag ttg ttg get gat aat ata aat tta 
Val Ala Asn Gin Lys Met He Glu Leu Leu Ala Asp Asn He Asn Leu 

660 665 670 

gga gat cct tct tea atg tta aag atg gtg gag gat ttg ata age gtg 
Gly Asp Pro Ser Ser Met Leu Lys Met Val Glu Asp Leu He Ser Val 
675 680 685 690 
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ggt gag gag gag tec ttt aac ctg aag cag aaa gta acg ttt cat gtg 
Gly Glu Glu Glu Ser Phe Asn Leu Lys Gin Lys Val Thr Phe His Val 

695 700 705 

ate ctg tct gtg etc gtc tct tgt tgt tea tct tta aaa gaa acc cac 
He Leu Ser Val Leu Val Ser Cys Cys Ser Ser Leu Lys Glu Thr His 

710 715 720 

ttt cca ttt gcg ata aga gtc ttc agt ttg ttg cag aaa aaa ata aag 
Phe Pro Phe Ala He Arg Val Phe Ser Leu Leu Gin Lys Lys He Lys 

725 730 735 

aag ctt gaa agt gtc att act gca gtg gaa ate ccc tea gaa tgg cac 
Lys Leu Glu Ser Val He Thr Ala Val Glu He Pro Ser Glu Trp His 

740 745 750 

att gaa ctg atg tta gac aga ggg ate cca gta gag ctg tgg gca cat 
He Glu Leu Met Leu Asp Arg Gly He Pro Val Glu Leu Trp Ala His 
755 760 765 770 

tat gta gaa gag etc aac age act cag agg gtg gee gtg gag gac teg 
Tyr Val Glu Glu Leu Asn Ser Thr Gin Arg Val Ala Val Glu Asp Ser 

775 780 785 

gtt ttt ctt gta ttt tec ttg aaa aaa ttt att tat gca ctg aaa get 
Val Phe Leu Val Phe Ser Leu Lys Lys Phe He Tyr Ala Leu Lys Ala 

790 795 800 

cct aaa tct ttt cct aaa ggt gat ata tgg tgg aat cct gaa caa ctg 
Pro Lys Ser Phe Pro Lys Gly Asp He Trp Trp Asn Pro Glu Gin Leu 

805 810 815 

aaa gaa gac age agg gac tat ctg cac ttg etc att ggg ctg ttt gag 
Lys Glu Asp Ser Arg Asp Tyr Leu His Leu Leu He Gly Leu Phe Glu 

820 825 830 

atg atg etc aat ggt gec gat get gtt cat ttc aga gtt ctg atg aaa 
Met Met Leu Asn Gly Ala Asp Ala Val His Phe Arg Val Leu Met Lys 
835 840 845 850 

ctt ttc ata aag gtk cat eta gaa gat gtt ttt cag tta ttc aag ttc 
Leu Phe He Lys Val His Leu Glu Asp Val Phe Gin Leu Phe Lys Phe 

855 860 865 

tgt tct gtt tta tgg acc tat ggt tct age ctt tea aat cca eta aac 
Cys Ser Val Leu Trp Thr Tyr Gly Ser Ser Leu Ser Asn Pro Leu Asn 

870 875 880 

tgc agt gtg aaa aca gtg ctg cag act caa get ctt tat gtg ggc tgt 
Cys Ser Val Lys Thr Val Leu Gin Thr Gin Ala Leu Tyr Val Gly Cys 

885 890 895 

gca atg ctt tct tct cag aag aca cag tgt aaa cac caa ctg gca tec 
Ala Met Leu Ser Ser Gin Lys Thr Gin Cys Lys His Gin Leu Ala Ser 

900 905 910 

ata tct tct cca gtg gtg aca tct tta etc att aac ctg gga age ccc 
He Ser Ser Pro Val Val Thr Ser Leu Leu He Asn Leu Gly Ser Pro 
915 920 925 930 

gta aaa gaa gtt cgt agg get gec att cag tgt etc cag gee etc agt 
Val Lys Glu Val Arg Arg Ala Ala He Gin Cys Leu Gin Ala Leu Ser 

935 940 945 

gga gtg gca tec ccg ttt tat ctg ata ata gat cat ttg att tct aaa 
Gly Val Ala Ser Pro Phe Tyr Leu He He Asp His Leu He Ser Lys 

950 955 960 

gca gag gag ate act tea gat get gec tat gtt att cag gat ttg get 
Ala Glu Glu He Thr Ser Asp Ala Ala Tyr Val He Gin Asp Leu Ala 

965 970 975 

act tta ttt gag gaa eta cag aga gaa aag aaa ctg aaa tct cat cag 
Thr Leu Phe Glu Glu Leu Gin Arg Glu Lys Lys Leu Lys Ser His Gin 

980 985 990 

aag ttg tct gaa act ttg aaa aac tta ctt agt tgt gtg tat agt tgc 
Lys Leu Ser Glu Thr Leu Lys Asn Leu Leu Ser Cys Val Tyr Ser Cys 
995 1000 1005 1010 

cca tct tat ata gca aaa gat ttg atg aaa gta ctt cag gga gtc aac 
Pro Ser Tyr He Ala Lys Asp Leu Met Lys Val Leu Gin Gly Val Asn 



61 



1015 1020 1025 









gtg 


ctt 


tct 


cag 


eta 


ttg 


cct 


atg 


get 


gaa 


caa 


ctg 


eta 


3238 


c? 

y 


Glu 


Met 


Val 


Leu 




Gin 


Leu 


Leu 


Pro 


Met 


Ala 


Glu 


Gin 


Leu 


Leu 








1030 










1035 










1040 








gaa 


aag 


ate 




aag 


gag 


ccc 


aca 


get 


gtg 


ctg 


aaa 


gat 


gag 


gec 


atg 


3286 


Glu 


Lys 


lie 


Gin 


Lvs 


Glu 


Pro 


Thr 


Ala 


Val 




Lys 


Asp 


Glu 


Ala 


Met 








1045 










1050 








1055 








gtt 


ctg 




etc 


act 


ctg 


gga 


aag 


tat 


aat 


gaa 


ttt 


tea 


gtt 


tec 


ctt 


3334 


Val 


Leu 


His 
is 


Leu 


Thr 


Leu 


Gly 


Lys 


Tyr 


Asn 


Glu 


Phe 


Ser 


Val 


Ser 


Leu 






1060 










1065 








1070 










tta 


aat 




gat 


ccg 


aag 


agt 


eta 


gat 


ata 


ttt 


ata 


aaa 


get 


gtg 


cac 


3382 


Leu 


Asn 


U 


Asp 


Pro 


Lys 


Ser 


Leu 


Asp 


He 


Phe 


He 


Lys 


Ala 


Val 


His 




1075 








108C 










1085 










1090 




aca 


aca 




gaa 


ctt 


tac 


gcg 


gga 


atg 


cca 


acc 


att 


cag 


ate 


aca 


gee 


3430 


Thr 


Thr 


Lys 


Glu 


Leu 


Tyr 


Ala 


Gly 


Met 


Pro 


Thr 


He 


Gin 


He 


Thr 


Ala 












1095 








1100 








1105 




ctt 


gaa 


aag 


att 


aca 


aaa 


cca 


ttt 


ttt 


gca 


gec 


ata 


tea 


gat 


gaa 


aaa 


3478 


Leu 


Glu 


Lys 


He 


Thr 


Lys 


Pro 


Phe 


Phe 


Ala 


Ala 


He 


Ser 


Asp 


Glu 


Lys 










1110 








1115 








1120 






gtt 


cag 


cag 


aag 


ctt 


tta 


aga 


atg 


ttg 


ttt 


gat 


tta 


ttg 


gtg 


aac 


tgt 


3526 


Val 


Gin 


Gin 


Lys 




Leu 


Arg 


Met 


Leu 


Phe 


Asp 


Leu 


Leu 


Val 


Asn 


Cys 








1125 








1130 








1135 








aaa 


aac 


tea 


cat 


tgt 


get 


cag 


act 


gtc 


age 


agt 


gtt 


ttt 


aaa 


ggg 


att 


3574 


Lys 


Asn 


Ser 


His 


Cys 


Ala 


Gin 


Thr 


Val 


Ser 


Ser 


Val 


Phe 


Lys 


Gly 


He 






1140 








1145 








1150 










tec 


gtt 


aat 


get 


gaa 


caa 


gtc 


cga 


ata 


gaa 


ctg 


gag 


cca 


cca 


gat 


aaa 


3622 


Ser 


Val 


Asn 


Ala 


Glu 


Gin 


Val 


Arg 


He 


Glu 


Leu 


Glu 


Pro 


Pro 


Asp 


Lys 




1155 








1160 








1165 








1170 




get 




ccc 


ttg 


ggc 


aca 


gtt 


cag 


caa 


aaa 


aga 


agg 


caa 




atg 


cag 


3670 


Ala 


Lys 


Pro 


Leu 


Gly Thr 


Val 


Gin 


Gin 


Lys 


Arg 


Arg 


Gin 


Lys 


Met 


Gin 












1175 








1180 








1185 




cag 


aaa 


aaa 


tea 


caa 


gat 


eta 


gaa 


tct 


gtt 


cag 


gaa 


gtt 


gga 


ggt 


tct 


3718 


Gin 


Lys 


Lys 


Ser 


Gin 


Asp 


Leu 


Glu 


Ser 


Val 


Gin 


Glu 


Val 


Gly Gly 


Ser 










1190 








1195 








1200 






tac 


tgg 


caa 


aga 


gta 


act 


etc 


ate 


ctg 


gaa 


tta 


ctg 


cag 


cac 


aaa 


aag 


3766 


Tyr 


Trp 


Gin 


Arg 


Val 


Thr 


Leu 


He 


Leu 


Glu 


Leu 


Leu 


Gin 


His 


Lys 


Lys 








1205 








1210 








1215 








aag 


etc 


aga 


agt 


cct 


cag 


ata 


ttg 


gtg 


cca 


act 


ctt 


ttt 


aac 


ttg 


eta 


3814 


Lys 


Leu 


Arg 


Ser 


Pro 


Gin 


He 


Leu 


Val 


Pro 


Thr 


Leu 


Phe 


Asn 


Leu 


Leu 






1220 








1225 








1230 










tea 


aga 


tgt 


tta 


gaa 


ccc 


ttg 


cca 


caa 


gag 


cag 


gga 


aat 


atg 


gaa 


tac 


3862 


Ser 


Arg 


Cys 


Leu 


Glu 


Pro 


Leu 


Pro 


Gin 


Glu 


Gin 


Gly 


Asn 


Met 


Glu 


Tyr 




1235 








1240 








1245 








1250 




ace 


aaa 


caa 


tta 


att 


ctt 


agt 


tgt 


ctg 


etc 


aac 


ate 


tgc 


caa 


aaa 


eta 


3910 


Thr 


Lys 


Gin 


Leu 


He 


Leu 


Ser 


Cys 


Leu 


Leu 


Asn 


He 


Cys 


Gin 


Lys 


Leu 












1255 








1260 








1265 




tct 


cca 


gat 


ggt 


ggc 


aaa 


ata 


ccc 


aaa 


gat 


att 


tta 


gat 


gag 


gag 


aag 


3958 


Ser 


Pro 


Asp 


Gly 


Gly 


Lys 


He 


Pro 


Lys 


Asp 


He 


Leu 


Asp 


Glu 


Glu 


Lys 










1270 








1275 








128i 


0 






ttc 


aac 


gtg 


gag 


ttg 


ata 


gtt 


cag 


tgc 


ate 


cgc 


ctt 


teg 


gag 


atg 


ccg 


4006 


Phe 


Asn 


Val 


Glu 


Leu 


He 


Val 


Gin 


Cys 


He 


Arg 


Leu 


Ser 


Glu 


Met 


Pro 








128 


5 








1290 








1295 








cag 


acc 


cat 


cac 


cat 


gec 


ctt 


tta 


ctt 


ttg 


ggc 


act 


gtt 


get 


gga 


ata 


4054 


Gin 


Thr 


His 


His 


His 


Ala 


Leu 


Leu 


Leu 


Leu 


Gly 


Thr 


Val 


Ala 


Gly 


He 






1300 








1305 








1310 










ttt 


ccg 


gat 


aaa 


gtt 


tta 


cac 


aat 


ate 


atg 


tct 


att 


ttt 


aca 


ttt 


atg 


4102 


Phe 


Pro 


Asp 


Lys 


Val 


Leu 


His 


Asn 


He 


Met 


Ser 


He 


Phe 


Thr 


Phe 


Met 




1315 








1320 








1325 








1330 




gga 


gee 


aat 


gtc 


atg 


cgc 


eta 


gat 


gat 


act 


tac 


agt 


ttt 


caa 


gtt 


att 


4150 


Gly 


Ala 


Asn 


Val 


Met 


Arg 


Leu 


Asp 


Asp 


Thr 


Tyr 


Ser 


Phe 


Gin 


Val 


He 












1335 








1340 








1345 




aac 


aag 


aca 


gtg 


aaa 


atg 


gtt 


att 


ccc 


gca 


ctt 


att 


cag 


tct 


gat 


agt 


4198 
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Asn Lys Thr Val Lys Met Val He Pro Ala Leu He Gin Ser Asp Ser 

1350 1355 1360 

gga gat tct ata gaa gtt tea aga aac gtt gaa gag att gtg gta aaa 
Gly Asp Ser He Glu Val Ser Arg Asn Val Glu Glu He Val Val Lys 

1365 1370 1375 

ate att agt gta ttt gtg gat gcg ctg cca cac gtc ccg gag cac agg 
He He Ser Val Phe Val Asp Ala Leu Pro His Val Pro Glu His Arg 

1380 1385 1390 

cgc ctg ccc ate ctt gtt caa ctt gtt gat aca ctg ggt gca gag aaa 
Arg Leu Pro He Leu Val Gin Leu Val Asp Thr Leu Gly Ala Glu Lys 
1395 1400 1405 1410 

ttc etc tgg att etc etc ate ttg ctt ttt gaa cag tat gtc aca aaa 
Phe Leu Trp He Leu Leu He Leu Leu Phe Glu Gin Tyr Val Thr Lys 

1415 1420 1425 

aca gtg ctg gcg get gee tat ggc gaa aag gat get att tta gaa gca 
Thr Val Leu Ala Ala Ala Tyr Gly Glu Lys Asp Ala He Leu Glu Ala 

1430 1435 1440 

gac act gaa ttt tgg ttt tea gtc tgt tgt gag ttt agt gtc cag cat 
Asp Thr Glu Phe Trp Phe Ser Val Cys Cys Glu Phe Ser Val Gin His 

1445 1450 1455 

cag ata caa age ttg atg aat ate etc cag tac tta eta aag ctg cca 
Gin He Gin Ser Leu Met Asn He Leu Gin Tyr Leu Leu Lys Leu Pro 

1460 1465 1470 

gag gaa aaa gaa gaa acc att ccc aaa gca gtg tea ttt aat aag agt 
Glu Glu Lys Glu Glu Thr He Pro Lys Ala Val Ser Phe Asn Lys Ser 
1475 1480 1485 1490 

gaa tea caa gaa gaa atg eta cag gtt ttt aat gta gag act cac act 
Glu Ser Gin Glu Glu Met Leu Gin Val Phe Asn Val Glu Thr His Thr 

1495 1500 1505 

age aag caa ctg egg cat ttt aaa ttt ttg tea gtg tec ttc atg tct 
Ser Lys Gin Leu Arg His Phe Lys Phe Leu Ser Val Ser Phe Met Ser 

1510 1515 1520 

cag etc ctg tct tec aat aat ttt ctg aaa aag gta gtt gag agt ggt 
Gin Leu Leu Ser Ser Asn Asn Phe Leu Lys Lys Val Val Glu Ser Gly 

1525 1530 1535 

ggt cct gag att tta aaa ggc ctt gaa gag agg ttg ctg gag acc gtt 
Gly Pro Glu lie Leu Lys Gly Leu Glu Glu Arg Leu Leu Glu Thr Val 

1540 1545 1550 

etc ggc tat ate agt gca gtt gca cag tec atg gaa agg aac gca gac 
Leu Gly Tyr He Ser Ala Val Ala Gin Ser Met Glu Arg Asn Ala Asp 
1555 1560 1565 1570 

aaa etc acc gtg aag ttc tgg cgc gcg etc ctt agt aaa get tac gac 
Lys Leu Thr Val Lys Phe Trp Arg Ala Leu Leu Ser Lys Ala Tyr Asp 

1575 1580 1585 

ctg tta gat aag gtc aat gee ttg ctg ccc aca gag aca ttc att cct 
Leu Leu Asp Lys Val Asn Ala Leu Leu Pro Thr Glu Thr Phe He Pro 

1590 1595 1600 

gtg ate aga ggg ctg gtg ggc aat ccc ctg cca tct gtt cgc cgc aaa 
Val He Arg Gly Leu Val Gly Asn Pro Leu Pro Ser Val Arg Arg Lys 

1605 1610 1615 

gcg ctg gac ctt ttg aat aac aag ctg cag caa aat ata tec tgg aag 
Ala Leu Asp Leu Leu Asn Asn Lys Leu Gin Gin Asn He Ser Trp Lys 

1620 1625 1630 

aag aca ata gtt acc cgt ttc eta aaa ctg gtt cca gac ctt ttg gec 
Lys Thr He Val Thr Arg Phe Leu Lys Leu Val Pro Asp Leu Leu Ala 
1635 1640 1645 1650 

att gtg cag cgt aag aaa aag gaa ggg gaa gaa gaa caa gca ate aac 
He Val Gin Arg Lys Lys Lys Glu Gly Glu Glu Glu Gin Ala He Asn 

1655 1660 1665 

aga cag aca gcg ttg tat acc tta aag ctt tta tgc aag aat ttt ggt 
Arg Gin Thr Ala Leu Tyr Thr Leu Lys Leu Leu Cys Lys Asn Phe Gly 
1670 1675 1680 
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gca gaa aat cca gat cct ttt gtc cca gtg ctg arc act get gtg aaa 
Ala Glu Asn Pro Asp Pro Phe Val Pro Val Leu Xaa Thr Ala Val Lys 

1685 1690 1695 

ctg att get cca gag aga aag gag gag aag aat gtc ytg gga age gcg 
Leu He Ala Pro Glu Arg Lys Glu Glu Lys Asn Val Leu Gly Ser Ala 

1700 1705 1710 

ctg ctg tgc ata gca gag gtg ace tec ace ctg gag gcg ctg gec ate 
Leu Leu Cys He Ala Glu Val Thr Ser Thr Leu Glu Ala Leu Ala He 
1715 . 1720 1725 1730 

ccc cag ctt ccc age ctg atg cca teg ttg ctg aca aca atg aag aac 
Pro Gin Leu Pro Ser Leu Met Pro Ser Leu Leu Thr Thr Met Lys Asn 

1735 1740 1745 

acc age gag ctg gtc tec age gag gtc tac ctg etc agt gee ttg get 
Thr Ser Glu Leu Val Ser Ser Glu Val Tyr Leu Leu Ser Ala Leu Ala 

1750 1755 1760 

get ctg cag aag gtt gtg gag act etc ccg cac ttc ate age ccc tat 
Ala Leu Gin Lys Val Val Glu Thr Leu Pro His Phe lie Ser Pro Tyr 

1765 1770 1775 

ctg gaa ggc att etc tec cag gtg att cat ctg gag aaa ate act agt 
Leu Glu Gly He Leu Ser Gin Val He His Leu Glu Lys He Thr Ser 

1780 1785 1790 

gaa atg ggt tct gcg tea cag get aat ate cgt etc aca tct ctt aaa 
Glu Met Gly Ser Ala Ser Gin Ala Asn He Arg Leu Thr Ser Leu Lys 
1795 1800 1805 1810 

aag aca ctg get acc aca ctt gca ccc cga gtc ctg ttg ccc gee ate 
Lys Thr Leu Ala Thr Thr Leu Ala Pro Arg Val Leu Leu Pro Ala He 

1815 1820 1825 

aaa aaa act tac aag cag att gag aag aac tgg aag aat cac atg ggt 
Lys Lys Thr Tyr Lys Gin lie Glu Lys Asn Trp Lys Asn His Met Gly 

1830 1835 1840 

ccg ttt atg age ate ttg caa gag cat att ggg gyg atg aag aag gaa 
Pro Phe Met Ser He Leu Gin Glu His He Gly Xaa Met Lys Lys Glu 

1845 1850 1855 

gag etc acc tec cat cag tct cag eta acc gee ttt ttc ctg gar gee 
Glu Leu Thr Ser His Gin Ser Gin Leu Thr Ala Phe Phe Leu Glu Ala 

1860 1865 1870 

ctg gac ttc cga gec cag cac tct gag aac gat ctg gag gaa gtt gga 
Leu Asp Phe Arg Ala Gin His Ser Glu Asn Asp Leu Glu Glu Val Gly 
1875 1880 1885 1890 

aaa acg gaa aat tgt ate att gac tgt eta gta gec atg gtt gtc aaa 
Lys Thr Glu Asn Cys He He Asp Cys Leu Val Ala Met Val Val Lys 

1895 1900 1905 

ctt tec gag gtc aca ttc agg ccc ctg ttc ttc aag ctg ttt gat tgg 
Leu Ser Glu Val Thr Phe Arg Pro Leu Phe Phe Lys Leu Phe Asp Trp 

1910 1915 1920 

get aaa aca gaa gat gee cca aag gac agg ttg ttg aca ttt tac aac 
Ala Lys Thr Glu Asp Ala Pro Lys Asp Arg Leu Leu Thr Phe Tyr Asn 

1925 1930 1935 

ttg gca gat tgc att get gaa aag ctg aaa ggg ctt ttt act ctg ttt 
Leu Ala Asp Cys He Ala Glu Lys Leu Lys Gly Leu Phe Thr Leu Phe 

1940 1945 1950 

gec ggc cac tta gtg aag cct ttt get gac acc ttg rac cag gtg aac 
Ala Gly His Leu Val Lys Pro Phe Ala Asp Thr Leu Xaa Gin Val Asn 
1955 I960 1965 1970 

ate tec aaa aca gat gaa gca ttt ttt gac tct gaa aat gac cct gaa 
He Ser Lys Thr Asp Glu Ala Phe Phe Asp Ser Glu Asn Asp Pro Glu 

1975 1980 1985 

aag tgc tgc ttg ctg ttg cag ttt att ttg aac tgt tta tac aaa ate 
Lys Cys Cys Leu Leu Leu Gin Phe He Leu Asn Cys Leu Tyr Lys He 

1990 1995 2000 

ttc ctt ttt gat acc cag cat ttt ata agt aaa gag aga gca gra gee 
Phe Leu Phe Asp Thr Gin His Phe He Ser Lys Glu Arg Ala Xaa Ala 
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2005 2010 2015 

ttg atg atg cct ctg gtg gat cag ctg gaa aac agg ctt ggg gga gaa 6214 
Leu Met Met Pro Leu Val Asp Gin Leu Glu Asn Arg Leu Gly Gly Glu 

2020 2025 2030 

gag aaa ttc cag gaa egg gtg aca aag cac ctg ata cca tgc ate gca 62 62 

Glu Lys Phe Gin Glu Arg Val Thr Lys His Leu He Pro Cys He Ala 
2035 2040 2045 2050 

cag ttt tcr gtg gec atg gcg gat gac tct ctt tgg aaa cca ctg aac 6310 
Gin Phe Ser Val Ala Met Ala Asp Asp Ser Leu Trp Lys Pro Leu Asn 

2055 2060 2065 

tac cag att ctg eta aag acg aga gac tec teg cct aag gtt cga ttt 6358 
Tyr Gin He Leu Leu Lys Thr Arg Asp Ser Ser Pro Lys Val Arg Phe 

2070 2075 2080 

get get ttg att act gtg tta gca ctg get gaa aaa eta aag gag aat 6406 
Ala Ala Leu He Thr Val Leu Ala Leu Ala Glu Lys Leu Lys Glu Asn 

2085 2090 2095 

tat att gtc ttg eta cca gaa tec att cct ttc tta gca gag ttg atg 6454 
Tyr He Val Leu Leu Pro Glu Ser He Pro Phe Leu Ala Glu Leu Met 

2100 2105 2110 

gaa gat gaa tgt gaa gaa gta gaa cat cag tgc caa aag act att cag 65 02 

Glu Asp Glu Cys Glu Glu Val Glu His Gin Cys Gin Lys Thr He Gin 
2115 2120 2125 2130 

caa ctg gaa act gtc ctg gga gag cca etc cag age tat ttc taa 6547 
Gin Leu Glu Thr Val Leu Gly Glu Pro Leu Gin Ser Tyr Phe * 

2135 2140 2145 

gactttctgt ggtgtttcat actctactca gagttcacac tcatatttca tatttttatt 6607 
tttgggtgtt gggtgccatg ttacttttgg tgecttaata cacctacttg gattacttac 6667 
aaatgtttta teacttegtt acaaaatccc cacctggctt gtgctgccac ataagectet 6727 
cctgcctatc gtatagagct gcagaaagag taaatgatac aeggtatttt tatacagact 6787 
gctgtgtttg tttaaacatt tattattctc ttcctgattg atggtaataa tattagactt 6847 
gttaatttta gcacccaaag ctgacgcctc atttgeactg taagccttaa ctcttctgta 6907 
cagcagtatc ttatatacat ggtatccatg ttgcagattt cactcaaagt tgctctattt 
caagaaaatg aagttattta gcaatcaaca gaagtacttt tgactgtaaa gectactttt 
cattttgggt aggegaaett cagccttcgt ttctttgttg tgcccataaa gagaagtggt 7087 
tctggaatgc tttttttaac ccaggagtgt gactgtcacc tttatccttt gttcttttgg 7147 
gaaaegggag agatgaaggc aacacgctgc ttctaaaaca gctcatacct ggctgctcac 72 07 
acagagggee cagaaacact gggtggcacg aggaagctcc tccaggattc agaatgaacc 7267 
cagttccatt ggtggttaac taagaactac ttgtctaaga aaacccccta tgatctgatt 7327 
caccaggctt acctcygaag ttctacagga tcatgtccca aatccagtct tttcaggtgg 7387 
gagaaacaag cttctagaac tatggttttg tcataaaata aaagaatctt agtgacgaga 744 7 
gggatcttag gaggagtata aattaattca tctcaatagc tcaaaggatg agatagecta 7507 
ttttgtgaaa tacatttttt gaatggctta cagactatga tgttagtact aaaaaatget 7567 
gaattatttg atatgaggaa aatgtatctg aaattatgta aaatgtaaag acaaaatgat 7 62 7 
actaaaaatg tataaatagt atacatgggc egggegeggt ggcttatgee tgtaatccca 7687 
gcactttggg aggecgagge agatggatca cgaggtcagg agttcgagac cagcctggac 7 74 7 
aacatagtga aaccctgtct ctactaaaaa tacaaaaatt agecaggage ggtggcaggc 7807 
gcctgtagtc ccagctactt tggaggctga gacaggagaa tegcttgaac ctgggaggcg 7867 
gagattgeag tgagctgaga ctgcgccact gccctccagc ctaggtgaca gagcaagctc 7927 
tgtaa 7 93 2 

<210> 4 

<211> 845 

<212> DNA 

<213> Homo sapiens 

<220> 

<221> allele 
<222> 256 

<223> 5-403-156 : polymorphic base C or T 
<400> 4 

taggagggag ggcgctcggg gctgggactt ttcaggacca gggtggtcac cgcacaggcc 60 



6967 
7027 
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240 
300 
360 
420 
480 



ccgcctgcct ggaccaagcg ctggccttcc cggggcgccc aggtccacgg ggtcaacgcc 
agggttttct cagcttcctc gtctgcctcg gatccaagtc cagacagtgc cagaagagac 
ttggaggcgc tgctttttga cagtacacac ctctgtatgc agacccccta tgatctgatt 
caccaggctt acctcygaag ttctacagga tcatgtccca aatccagtct tttcaggtgg 
gagaaacaag cttctagaac tatggttttg tcataaaata aaagaatctt agtgacgaga 
gggatcttag gaggagtata aattaattca tctcaatagc tcaaaggatg agatagccta 
ttttgtgaaa tacatttttt gaatggctta cagactatga tgttagtact aaaaaatgct 
gaattatttg atatgaggaa aatgtatctg aaattatgta aaatgtaaag acaaaatgat 54 0 
actaaaaatg tataaatagt atacatgggc cgggcgcggt ggcttatgcc tgtaatccca 600 
gcactttggg aggccgaggc agatggatca cgaggtcagg agttcgagac cagcctggac 660 
aacatagtga aaccctgtct ctactaaaaa tacaaaaatt agccaggagc ggtggcaggc 72 0 
gcctgtagtc ccagctactt tggaggctga gacaggagaa tcgcttgaac ctgggaggcg 780 
gagattgcag tgagctgaga ctgcgccact gccctccagc ctaggtgaca gagcaagctc 840 
tgtaa 845 

<210> 5 

<211> 2144 

<212> PRT 

<213> Homo sapiens 



<220> 

<221> VARIANT 
<222> 1694 

<223> Xaa=Ser or Asn 



<220> 

<221> VARIANT 
<222> 1854 

<223> Xaa=Ala or Val 



<220> 

<221> VARIANT 
<222> 1967 

<223> Xaa=Asp or Asn 
<220> 

<221> VARIANT 
<222> 2017 

<223> Xaa=Gly or Glu 



<400> 5 






























Met 


Thr 


Ser 


Leu 


Ala 


Gin 


Gin 


Leu 


Gin 


Arg 


Leu 


Ala 


Leu 


Pro 


Gin 


Ser 


1 








5 










10 










15 




Asp 


Ala 


Ser 


Leu 


Leu 


Ser 


Arg 


Asp 


Glu 


Val 


Ala 


Ser 


Leu 


Leu 


Phe 


Asp 








20 










25 










30 






Pro 


Lys 


Glu 


Ala 


Ala 


Thr 


He 


Asp 


Arg 


Asp 


Thr 


Ala 


Phe 


Ala 


He 


Gly 






35 










40 










45 








Cys 


Thr 


Gly 


Leu 


Glu 


Glu 


Leu 


Leu 


Gly 


He 


Asp 


Pro 


Ser 


Phe 


Glu 


Gin 




50 










55 










60 










Phe 


Glu 


Ala 


Pro 


Leu 


Phe 


Ser 


Gin 


Leu 


Ala 


Lys 


Thr 


Leu 


Glu 


Arg 


Ser 


65 










70 










75 










80 


Val 


Gin 


Thr 


Lys 


Ala 


Val 


Asn 


Lys 


Gin 


Leu 


Asp 


Glu 


Asn 


He 


Ser 


Leu 










85 










90 










95 




Phe 


Leu 


He 


His 


Leu 


Ser 


Pro 


Tyr 


Phe 




Leu 


Lys 


Pro 


Ala 


Gin 


Lys 








100 










105 










110 






Cys 


Leu 


Glu 


Trp 


Leu 


He 


His 


Arg 


Phe 


His 


He 


His 


Leu 


Tyr 


Asn 


Gin 






115 










120 










125 








Asp 


Ser 


Leu 


He 


Ala 


Cys 


Val 


Leu 


Pro 


Tyr 


His 


Glu 


Thr 


Arg 


He 


Phe 


130 










135 










140 










Val 


Arg 


Val 


He 


Gin 




Leu 


Lys 


He 


Asn 


Asn 


Ser 


Lys 


His 


Arg 


Trp 


14 5 










150 










155 










160 


Phe 


Trp 


Leu 


Leu 


Pro 


Val 


Lys 


Gin 


Ser 


Gly Val 


Pro 


Leu 


Ala 


Lys 


Gly 
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Thr Leu lie Thr 
180 

Cys Ser Leu Val 
195 

Ser Ser Ala Gin 
210 

Val Ser Ala Leu 
225 

Lys Leu Phe Pro 

Tyr Arg Ala Ala 
260 

Thr Met Glu Asn 
275 

Thr Leu Thr Lys 
290 

lie Val Leu Leu 
305 

Phe Pro His Leu 

lie Ser Glu Thr 
340 

His Leu Val Val 
355 

Gly Met Asp Gly 
370 

Lys lie Ser Leu 
385 

Phe Glu Glu Tyr 

Lys Val Ser Leu 
420 

Glu Ser Lys Tyr 
435 

Lys Glu lie Ala 
450 

Ser Leu Ser Thr 
465 

Thr Ser Leu Met 

Leu Ala Met Asn 
500 

Val Asp Glu Ser 
515 

Asp Asn lie Asp 
530 

Lys Glu His Phe 
545 

Phe Gin Arg Ala 

Lys lie Ala Ala 
580 

Asp Gin Leu Ser 
595 

lie Asn Asn Asp 
610 

Leu Ser Lys Ser 
625 

Glu Glu Ala Leu 
lie Gly Val Ala 
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His Cys Tyr Lys 

Thr Lys Ser Val 
200 

Leu Arg Val Leu 
215 

Val Ala Ala Glu 
230 

Tyr lie Gin Lys 
245 

Thr Tyr Met lie 

Thr Phe Val Asn 
280 

lie Pro Ser Leu 
295 

Gin Arg Gin Lys 
310 

Cys Asn Val Pro 
325 

Tyr Asp Val Ser 

Ser lie lie His 
360 

Gin lie Tyr Lys 
375 

Lys Asn Asn Leu 
390 

lie Ser Tyr Ser 
405 

Leu Asn Glu Gin 

Pro Arg Thr Leu 
440 

Asp Leu Lys Lys 
455 

Ser Gly Gly Lys 
470 

Leu Ser Leu Asn 
485 

His Leu Lys Lys 

Phe lie Lys Glu 
520 

Val Val Leu Ser 
535 

Ser Ser Glu Val 
550 

Glu Leu Ser Lys 
565 

Asp lie Leu lie 

Asn Gin Val Val 
600 

Asp Thr Glu Ser 
615 

Gly lie Cys Ser 
630 

Glu Asn Val lie 
645 

Asn Gin Lys Met 
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Asp Leu Gly Phe 
185 

Lys Val Phe Ala 

Leu Ala Phe Tyr 
220 

Asp Val Ser Asp 
235 

Gly Leu Lys Ser 
250 

lie Cys Gin lie 
265 

Ser Leu Ala Ser 

lie Lys Asp Gly 
300 

Pro Glu Ser Leu 
315 

Asp Leu lie Thr 
330 

Pro Leu Leu Arg 
345 

His Val Thr Gly 

Arg His Leu Glu 
380 

Asp His Leu Leu 
395 

Ser Gin Glu Glu 
410 

Phe Leu Pro Leu 
425 

Asp Val Val Leu 

Gin Glu Leu Phe 
460 

Tyr Gin Phe Leu 
475 

His Pro Leu Ala 
490 

lie Met Lys Thr 
505 

Ala Val Leu Ala 

Ala lie Ser Ala 
540 

Thr lie Ser Asn 
555 

Asn Gly Glu Trp 
570 

Lys Glu Glu lie 
585 

Val Cys Leu Leu 

Ala Glu Met Lys 
620 

Leu His Pro Leu 
635 

Lys Ser Thr Lys 
650 

lie Glu Leu Leu 
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Met Asp Phe lie 
190 

Glu Tyr Pro Gly 
205 

Ala Ser Thr lie 

Asn lie lie Ala 
240 

Ser Leu Pro Asp 
255 

Ser Val Lys Val 
270 

Gin He He Lys 
285 

Leu Ser Cys Leu 

Gly Lys Lys Pro 
320 

He Leu His Gly 
335 

Tyr Met Leu Pro 
350 

Glu Glu Thr Glu 
365 

Ala He Leu Thr 

Ala Ser Leu Leu 
400 

Met Asp Ser Asn 
415 

He Arg Leu Leu 
430 

Glu Glu His Leu 
445 

His Gin Phe Val 

Ala Asp Ser Asp 
480 

Pro Val Arg He 
495 

Ser Lys Glu Gly 
510 

Arg Leu Gly Asp 
525 

Phe Glu lie Phe 

Leu Leu Asn Leu 
560 

Tyr Glu Val Leu 
575 

Leu Ser Glu Asn 
590 

Pro Phe Val Val 
605 

He Ala He Tyr 

Leu Arg Gly Trp 
640 

Pro Gly Lys Leu 
655 

Ala Asp Asn He 
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660 665 670 

Asn Leu Gly Asp Pro Ser Ser Met Leu Lys Met Val Glu Asp Leu He 

675 680 685 

Ser Val Gly Glu Glu Glu Ser Phe Asn Leu Lys Gin Lys Val Thr Phe 

690 695 700 

His Val He Leu Ser Val Leu Val Ser Cys Cys Ser Ser Leu Lys Glu 
705 710 715 720 

Thr His Phe Pro Phe Ala He Arg Val Phe Ser Leu Leu Gin Lys Lys 

725 730 735 

He Lys Lys Leu Glu Ser Val He Thr Ala Val Glu He Pro Ser Glu 

740 745 750 

Trp His He Glu Leu Met Leu Asp Arg Gly He Pro Val Glu Leu Trp 

755 760 765 

Ala His Tyr Val Glu Glu Leu Asn Ser Thr Gin Arg Val Ala Val Glu 

770 775 780 

Asp Ser Val Phe Leu Val Phe Ser Leu Lys Lys Phe He Tyr Ala Leu 
785 790 795 800 

Lys Ala Pro Lys Ser Phe Pro Lys Gly Asp He Trp Trp Asn Pro Glu 

805 810 815 

Gin Leu Lys Glu Asp Ser Arg Asp Tyr Leu His Leu Leu He Gly Leu 

820 825 830 

Phe Glu Met Met Leu Asn Gly Ala Asp Ala Val His Phe Arg Val Leu 

835 840 845 

Met Lys Leu Phe He Lys Val His Leu Glu Asp Val Phe Gin Leu Phe 

850 855 860 

Lys Phe Cys Ser Val Leu Trp Thr Tyr Gly Ser Ser Leu Ser Asn Pro 
865 870 875 880 

Leu Asn Cys Ser Val Lys Thr Val Leu Gin Thr Gin Ala Leu Tyr Val 

885 890 895 

Gly Cys Ala Met Leu Ser Ser Gin Lys Thr Gin Cys Lys His Gin Leu 

900 905 910 

Ala Ser He Ser Ser Pro Val Val Thr Ser Leu Leu He Asn Leu Gly 

915 920 925 

Ser Pro Val Lys Glu Val Arg Arg Ala Ala He Gin Cys Leu Gin Ala 

930 935 940 

Leu Ser Gly Val Ala Ser Pro Phe Tyr Leu He He Asp His Leu He 
945 950 955 960 

Ser Lys Ala Glu Glu He Thr Ser Asp Ala Ala Tyr Val He Gin Asp 

965 970 975 

Leu Ala Thr Leu Phe Glu Glu Leu Gin Arg Glu Lys Lys Leu Lys Ser 

980 985 990 

His Gin Lys Leu Ser Glu Thr Leu Lys Asn Leu Leu Ser Cys Val Tyr 

995 1000 1005 

Ser Cys Pro Ser Tyr He Ala Lys Asp Leu Met Lys Val Leu Gin Gly 

1010 1015 1020 

Val Asn Gly Glu Met Val Leu Ser Gin Leu Leu Pro Met Ala Glu Gin 
1025 1030 1035 1040 

Leu Leu Glu Lys He Gin Lys Glu Pro Thr Ala Val Leu Lys Asp Glu 

1045 1050 1055 

Ala Met Val Leu His Leu Thr Leu Gly Lys Tyr Asn Glu Phe Ser Val 

1060 1065 1070 

Ser Leu Leu Asn Glu Asp Pro Lys Ser Leu Asp He Phe He Lys Ala 

1075 1080 1085 

Val His Thr Thr Lys Glu Leu Tyr Ala Gly Met Pro Thr He Gin He 

1090 1095 1100 

Thr Ala Leu Glu Lys He Thr Lys Pro Phe Phe Ala Ala He Ser Asp 
1105 1110 1115 1120 

Glu Lys Val Gin Gin Lys Leu Leu Arg Met Leu Phe Asp Leu Leu Val 

1125 1130 1135 

Asn Cys Lys Asn Ser His Cys Ala Gin Thr Val Ser Ser Val Phe Lys 

1140 1145 1150 

Gly He Ser Val Asn Ala Glu Gin Val Arg He Glu Leu Glu Pro Pro 
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1155 1160 1165 

Asp Lys Ala Lys Pro Leu Gly Thr Val Gin Gin Lys Arg Arg Gin Lys 

1170 1175 1180 

Met Gin Gin Lys Lys Ser Gin Asp Leu Glu Ser Val Gin Glu Val Gly 
1185 1190 1195 1200 

Gly Ser Tyr Trp Gin Arg Val Thr Leu lie Leu Glu Leu Leu Gin His 

1205 1210 1215 

Lys Lys Lys Leu Arg Ser Pro Gin lie Leu Val Pro Thr Leu Phe Asn 

1220 1225 1230 

Leu Leu Ser Arg Cys Leu Glu Pro Leu Pro Gin Glu Gin Gly Asn Met 

1235 1240 1245 

Glu Tyr Thr Lys Gin Leu lie Leu Ser Cys Leu Leu Asn lie Cys Gin 

1250 1255 1260 

Lys Leu Ser Pro Asp Gly Gly Lys lie Pro Lys Asp lie Leu Asp Glu 
1265 1270 1275 1280 

Glu Lys Phe Asn Val Glu Leu He Val Gin Cys He Arg Leu Ser Glu 

1285 1290 1295 

Met Pro Gin Thr His His His Ala Leu Leu Leu Leu Gly Thr Val Ala 

1300 1305 1310 

Gly He Phe Pro Asp Lys Val Leu His Asn He Met Ser He Phe Thr 

1315 1320 1325 

Phe Met Gly Ala Asn Val Met Arg Leu Asp Asp Thr Tyr Ser Phe Gin 

1330 1335 1340 

Val He Asn Lys Thr Val Lys Met Val He Pro Ala Leu He Gin Ser 
1345 1350 1355 1360 

Asp Ser Gly Asp Ser He Glu Val Ser Arg Asn Val Glu Glu He Val 

1365 1370 1375 

Val Lys He He Ser Val Phe Val Asp Ala Leu Pro His Val Pro Glu 

1380 1385 1390 

His Arg Arg Leu Pro He Leu Val Gin Leu Val Asp Thr Leu Gly Ala 

1395 1400 1405 

Glu Lys Phe Leu Trp He Leu Leu He Leu Leu Phe Glu Gin Tyr Val 

1410 1415 1420 

Thr Lys Thr Val Leu Ala Ala Ala Tyr Gly Glu Lys Asp Ala He Leu 
1425 1430 1435 1440 

Glu Ala Asp Thr Glu Phe Trp Phe Ser Val Cys Cys Glu Phe Ser Val 

1445 1450 1455 

Gin His Gin He Gin Ser Leu Met Asn He Leu Gin Tyr Leu Leu Lys 

1460 1465 1470 

Leu Pro Glu Glu Lys Glu Glu Thr He Pro Lys Ala Val Ser Phe Asn 

1475 1480 1485 

Lys Ser Glu Ser Gin Glu Glu Met Leu Gin Val Phe Asn Val Glu Thr 

1490 1495 1500 

His Thr Ser Lys Gin Leu Arg His Phe Lys Phe Leu Ser Val Ser Phe 
1505 1510 1515 1520 

Met Ser Gin Leu Leu Ser Ser Asn Asn Phe Leu Lys Lys Val Val Glu 

1525 1530 1535 

Ser Gly Gly Pro Glu He Leu Lys Gly Leu Glu Glu Arg Leu Leu Glu 

1540 1545 1550 

Thr Val Leu Gly Tyr He Ser Ala Val Ala Gin Ser Met Glu Arg Asn 

1555 1560 1565 

Ala Asp Lys Leu Thr Val Lys Phe Trp Arg Ala Leu Leu Ser Lys Ala 

1570 1575 1580 

Tyr Asp Leu Leu Asp Lys Val Asn Ala Leu Leu Pro Thr Glu Thr Phe 
1585 1590 1595 1600 

He Pro Val He Arg Gly Leu Val Gly Asn Pro Leu Pro Ser Val Arg 

1605 1610 1615 

Arg Lys Ala Leu Asp Leu Leu Asn Asn Lys Leu Gin Gin Asn He Ser 

1620 1625 1630 

Trp Lys Lys Thr He Val Thr Arg Phe Leu Lys Leu Val Pro Asp Leu 

1635 1640 1645 

Leu Ala He Val Gin Arg Lys Lys Lys Glu Gly Glu Glu Glu Gin Ala 
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1650 1655 1660 

lie Asn Arg Gin Thr Ala Leu Tyr Thr Leu Lys Leu Leu Cys Lys Asn 
1665 1670 1675 1680 

Phe Gly Ala Glu Asn Pro Asp Pro Phe Val Pro Val Leu Xaa Thr Ala 

1685 1690 1695 

Val Lys Leu lie Ala Pro Glu Arg Lys Glu Glu Lys Asn Val Leu Gly 

1700 1705 1710 

Ser Ala Leu Leu Cys lie Ala Glu Val Thr Ser Thr Leu Glu Ala Leu 

1715 1720 1725 

Ala He Pro Gin Leu Pro Ser Leu Met Pro Ser Leu Leu Thr Thr Met 

1730 1735 1740 

Lys Asn Thr Ser Glu Leu Val Ser Ser Glu Val Tyr Leu Leu Ser Ala 
1745 1750 1755 1760 

Leu Ala Ala Leu Gin Lys Val Val Glu Thr Leu Pro His Phe lie Ser 

1765 1770 1775 

Pro Tyr Leu Glu Gly lie Leu Ser Gin Val lie His Leu Glu Lys lie 

1780 1785 1790 

Thr Ser Glu Met Gly Ser Ala Ser Gin Ala Asn lie Arg Leu Thr Ser 

1795 1800 1805 

Leu Lys Lys Thr Leu Ala Thr Thr Leu Ala Pro Arg Val Leu Leu Pro 

1810 1815 1820 

Ala lie Lys Lys Thr Tyr Lys Gin lie Glu Lys Asn Trp Lys Asn His 
1825 1830 1835 1840 

Met Gly Pro Phe Met Ser lie Leu Gin Glu His lie Gly Xaa Met Lys 

1845 1850 1855 

Lys Glu Glu Leu Thr Ser His Gin Ser Gin Leu Thr Ala Phe Phe Leu 

I860 1865 1870 

Glu Ala Leu Asp Phe Arg Ala Gin His Ser Glu Asn Asp Leu Glu Glu 

1875 1880 1885 

Val Gly Lys Thr Glu Asn Cys lie lie Asp Cys Leu Val Ala Met Val 

1890 1895 1900 

Val Lys Leu Ser Glu Val Thr Phe Arg Pro Leu Phe Phe Lys Leu Phe 
1905 1910 1915 1920 

Asp Trp Ala Lys Thr Glu Asp Ala Pro Lys Asp Arg Leu Leu Thr Phe 

1925 1930 1935 

Tyr Asn Leu Ala Asp Cys lie Ala Glu Lys Leu Lys Gly Leu Phe Thr 

1940 1945 1950 

Leu Phe Ala Gly His Leu Val Lys Pro Phe Ala Asp Thr Leu Xaa Gin 

1955 1960 1965 

Val Asn lie Ser Lys Thr Asp Glu Ala Phe Phe Asp Ser Glu Asn Asp 

1970 1975 1980 

Pro Glu Lys Cys Cys Leu Leu Leu Gin Phe lie Leu Asn Cys Leu Tyr 
1985 1990 1995 2000 

Lys lie Phe Leu Phe Asp Thr Gin His Phe lie Ser Lys Glu Arg Ala 

2005 2010 2015 

Xaa Ala Leu Met Met Pro Leu Val Asp Gin Leu Glu Asn Arg Leu Gly 

2020 2025 2030 

Gly Glu Glu Lys Phe Gin Glu Arg Val Thr Lys His Leu lie Pro Cys 

2035 2040 2045 

lie Ala Gin Phe Ser Val Ala Met Ala Asp Asp Ser Leu Trp Lys Pro 

2050 2055 2060 

Leu Asn Tyr Gin lie Leu Leu Lys Thr Arg Asp Ser Ser Pro Lys Val 
2065 2070 2075 2080 

Arg Phe Ala Ala Leu lie Thr Val Leu Ala Leu Ala Glu Lys Leu Lys 

2085 2090 2095 

Glu Asn Tyr lie Val Leu Leu Pro Glu Ser lie Pro Phe Leu Ala Glu 

2100 2105 2110 

Leu Met Glu Asp Glu Cys Glu Glu Val Glu His Gin Cys Gin Lys Thr 

2115 2120 2125 

lie Gin Gin Leu Glu Thr Val Leu Gly Glu Pro Leu Gin Ser Tyr Phe 
2130 2135 2140 
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<210> 6 

<211> 5406 

<212> DNA 

<213> Homo sapiens 

<220> 

<221> 5'UTR 
<222> 1 . . 198 

<220> 
<221> CDS 
<222> 199 . . 1149 

<220> 

<221> 3'UTR 
<222> 1150 . . 5406 

<220> 

<221> polyA_signal 
<222> 5384. .5389 

<400> 6 

ccgcccacgg acgccagagc cgggaaccct gacggcactt agctgctgac aaacaacctg 60 
ctccgtggag cgcctgaaac accagtcttt ggggccagtg cctcagtttc aatccaggta 120 
acctttaaat gaaacttgcc taaaatctta ggtcatacac agaagagact ccaatcgaca 180 
agaagctgga aaagaatg atg ttg tec tta aac aac eta cag aat ate ate 231 
Met Leu Ser Leu Asn Asn Leu Gin Asn lie lie 
15 10 
tat aac ccg gta ate ccg tat gtt ggc acc att ccc gat cag ctg gat 279 
Tyr Asn Pro Val lie Pro Tyr Val Gly Thr lie Pro Asp Gin Leu Asp 

15 20 25 

cct gga act ttg att gtg ata tgt ggg cat gtt cct agt gac gca gac 32 7 

Pro Gly Thr Leu lie Val lie Cys Gly His Val Pro Ser Asp Ala Asp 

30 35 40 

aga ttc cag gtg gat ctg cag aat ggc age agt gtg aaa cct cga gec 3 75 

Arg Phe Gin Val Asp Leu Gin Asn Gly Ser Ser Val Lys Pro Arg Ala 

45 50 55 

gat gtg gec ttt cat ttc aat cct cgt ttc aaa agg gec ggc tgc att 423 
Asp Val Ala Phe His Phe Asn Pro Arg Phe Lys Arg Ala Gly Cys lie 
60 65 70 75 

gtt tgc aat act ttg ata aat gaa aaa tgg gga egg gaa gag ate acc 471 
Val Cys Asn Thr Leu lie Asn Glu Lys Trp Gly Arg Glu Glu lie Thr 

80 85 90 

tat gac acg cct ttc aaa aga gaa aag tct ttt gag ate gtg att atg 519 
Tyr Asp Thr Pro Phe Lys Arg Glu Lys Ser Phe Glu lie Val lie Met 

95 100 105 

gtg eta aag gac aaa ttc cag gtg get gta aat gga aaa cat act ctg 567 
Val Leu Lys Asp Lys Phe Gin Val Ala Val Asn Gly Lys His Thr Leu 

110 115 120 

etc tat ggc cac agg ate ggc cca gag aaa ata gac act ctg ggc att 615 
Leu Tyr Gly His Arg lie Gly Pro Glu Lys lie Asp Thr Leu Gly lie 

125 130 135 

tat ggc aaa gtg aat att cac tea att ggt ttt age ttc age teg gac 663 
Tyr Gly Lys Val Asn lie His Ser lie Gly Phe Ser Phe Ser Ser Asp 
140 145 150 155 

tta caa agt acc caa gca tct agt ctg gaa ctg aca gag ata agt aga 711 
Leu Gin Ser Thr Gin Ala Ser Ser Leu Glu Leu Thr Glu lie Ser Arg 

160 165 170 

gaa aat gtt cca aag tct ggc acg ccc cag ctt age ctg cca ttc get 759 
Glu Asn Val Pro Lys Ser Gly Thr Pro Gin Leu Ser Leu Pro Phe Ala 

175 180 185 

gca agg ttg aac acc ccc atg ggc cct gga cga act gtc gtc gtt aaa 807 
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Ala 


Arg 


Leu 


Asn 


Thr 


Pro 


Met 


Gly 


Pro 


Gly 


Arg 


Thr 


Val 


Val 


Val 


Lys 








190 










195 










200 










gga 


gaa 




aat 


gca 


aat 


gec 




age 


ttt 


aat 


gtt 


gac 


eta 


eta 


gca 


855 


Gly 


Glu 


Val 


Asn 


Ala 


Asn 


Ala 


Lys 


Ser 


Phe 


Asn 


Val 


Asp 


Leu 


Leu 


Ala 




205 










210 










215 












gga 


aaa 


tea 




aat 


att 




eta 


cac 


ttg 


aac 


cca 


cgc 


ctg 


aat 


att 


903 


Gly Lys 


Ser 


Lvs 


AST) 


He 


Ala 




His 






Pro 


Arg 


Leu 


Asn 


He 




220 










225 










230 










235 




aaa 


gca 


ttt 


gta 






tct 


ttt 


ctt 


cag 




tec 


tgg 


gga 


gaa 


gaa 


951 






Phe 


Val 


Ara 


Asn 




Phe 




Gin 


Glu 


Ser 


Trp 


Gly 


Glu 


Glu 










240 










245 










250 






gag 


aga 


aat 


att 


acc 


tct 


ttc 


cca 


ttt 




cct 


ggg 


atg 


tac 


ttt 


gag 


999 


Glu 


Arg 


Asn 


He 


Thr 






Pro 


Phe 


Ser 


Pro 


Gly 


Met 


Tyr 


Phe 


Glu 




















260 










265 








atg 


ata 


att 


tac 


t 


gat 


gtt 


aga 




ttc 


aag 


gtt 


gca 


gta 


aat 


ggc 


1047 


Met 


He 


He 


Tvr 


Cvs 


Asp 


Val 


Arg 


Glu 


Phe 


Lys 


Val 


Ala 


Val 


Asn 


Gly 








270 










275 










280 










gta 




age 


ctg 


gag 


tac 


aaa 


cac 


aga 


ttt 


aaa 


gag 


etc 


age 


agt 


att 


1095 


Val 


His 


Ser 


Leu 


Glu 


Tyr 


Lys 


His 


Arg 


Phe 


Lys 


Glu 


Leu 


Ser 


Ser 


He 






285 










290 










295 












gac 


acg 


ctg 


gaa 


att 


aat 


gga 


gac 


ate 


cac 


tta 


ctg 


gaa 


gta 


agg 


age 


1143 


Asp 


Thr 


Leu 


Glu 


He 


Asn 


Gly Asp 


He 


His 


Leu 


Leu 


Glu 


Val 


Arg 


Ser 




300 










305 










310 










315 





tgg tag cctacctaca cagctgctac aaaaaccaaa atacagaatg gcttctgtga 1199 
Trp * 

tactggcett getgaaaege atctcactgt cattctattg tttatattgt taaaatgagc 1259 

ttgtgcacca ttagatcctg ctgggtgttc tcagtccttg ccatgaagta tggtggtgtc 1319 

tagcactgaa tggggaaact gggggcagca acacttatag ccagttaaag ccactctgcc 1379 

ctctctccta ctttggctga ctcttcaaga atgecattea acaagtattt atggagtacc 143 9 

tactataata cagtagctaa catgtattga gcacagattt tttttggtaa aactgtgagg 14 99 

agctaggata tatacttggt gaaacaaacc agtatgttcc ctgttctctt gagcttcgac 155 9 

tcttctgtgc tetattgetg cgcactgctt tttctacagg cattacatca actcctaagg 1619 

ggtcctctgg gattagttaa gcagctatta aatcacccga agacactaat ttacagaaga 167 9 

cacaactcct tccccagtga tcactgtcat aaccagtget ctaccgtatc ccatcactga 1739 

ggactgatgt tgactgacat cattttatcg taataaacat gtggctctat tagctgeaag 1799 

ctttaccaag taattggcat gacatctgag cacagaaatt aaggcaaaaa accaaagcaa 1859 

aacaaataca tggtgctgaa attaacttga tgccaagccc aaggcagctg atttctgtgt 1919 

atttgaactt agggcaaatc agagtctaca cagacgccta cagaaagttt caggaagagg 197 9 

caagatgeat tcaatttgaa agatatttat gggcaacaaa gtaaggtcag gattagactt 2 03 9 

caggcattca taaggcaggc actatcagaa agtgtacgcc aactaaggga cccacaaagc 2099 

aggcagaggt aatgcagaaa tctgttttgt tcccatgaaa tcaccaatca aggcctccgt 2159 

tcttctaaag attagtccat catcattagc aactgagatc aaagcactct tccactttac 2219 

gtgattaaaa tcaaacctgt atcagcaagt taaatggttc catttctgtg atttttctat 2279 

tatttgaggg gagttggcag aagttccatg tatatgggat ctttacaggt cagatcttgt 2 33 9 

tacaggaaat ttcaaaggtt tgggagtggg gagggaaaaa agctcagtca gtgaggatca 2 3 99 

ttttatcaca ttagactggg gcagaactct gecaggattt aggaatattt tcagaacaga 2459 

ttttagatat tatttctatc catatattga aaagaatacc attgtcaatc ttattttttt 2519 

aaaagtactc agtgtagaaa ttgctagccc ttaattcttt tccagctttt catattaatg 2579 

tatgeagagt ctcaccaagc tcaaagacac tggttggggg tggagggtgc cacagggaaa 2 63 9 

gctgtagaag gcaagaagac tcgagaatcc cccagagtta tttttctcca taaagaccat 2 699 

cagagtgett aactgagctg ttggagactg tgaggcattt aggaaaaaaa tagcccactc 2 759 

acatcattcc ttgtaagtct taagttcatt ttcattttac gtggaggaaa aaaatttaaa 2 819 

aagctattag tatttattaa tgaattttac tgagacattt cttagaaata tgcacttcta 2 879 

tactagcaag ctctgtctct aaaatgcaag ttggcctttt gcttgccaca tttctgeatt 2 93 9 

aaacttctat attagcttca aaggctttta aactcaatgc gaacattcta cgggatgttc 2 999 

ttagatgect ttaaaaaggg ggcagatcta attttatttg aaccctcact ttccaacttc 3059 

accatgaccc agtactagag attagggcac ttcaaagcat tgaaaaaaat ctactgatac 3119 

ttactttctt agacaagtag ttcttagtta accaccaatg gaactgggtt cattctgaat 3179 

cctggaggag cttcctcgtg ccacccagtg tttctgggcc ctctgtgtga geagecaggt 3239 

atgagctgtt ttagaagcag cgtgttgcct tcatctctcc cgtttcccaa aagaacaaag 3299 

gataaaggtg acagtcacac tcctgggtta aaaaaagcat tccagaacca cttctcttta 3359 

tgggcacaac aaagaaacga aggctgaagt tcgcctaccc aaaatgaaaa gtaggcttta 3419 
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cagtcaaaag tacttctgtt gattgctaaa taacttcatt ttcttgaaat agagcaactt 3479 

tgagtgaaat ctgcaacatg gataccatgt atataagata ctgctgtaca gaagagttaa 3 53 9 

ggcttacagt gcaaatgagg cgtcagcttt gggtgctaaa attaacaagt ctaatattat 3 599 

taccatcaat caggaagaga ataataaatg tttaaacaaa cacagcagtc tgtataaaaa 3 65 9 

taccgtgtat catttactct ttctgcagct ctatacgata ggcaggagag gcttatgtgg 3719 

cagcacaagc caggtgggga ttttgtaacg aagtgataaa acatttgtaa gtaatccaag 3779 

taggtgtatt aaggcaccaa aagtaacatg gcacccaaca cccaaaaata aaaatatgaa 3 839 

atatgagtgt gaactctgag tagagtatga aacaccacag aaagtcttag aaatagctct 3 8 99 

ggagtggctc tcccaggaca gtttccagtt gctgaatagt cttttggcac tgatgttcta 3 959 

cttcttcaca ttcatctaaa aaaaaaaaaa aaaaaaatca aaattaaaat ctgagtcagt 4019 

ctgcctgcct cggttctcat tagtttaatt cttaatgcct tgcactttcc agcaatcatt 4079 

caatcaaaag agtgaaatga agcacattaa caaagcagga ggcgccacgg accgcctccc 4139 

tccacaccgc tccttccgcc ttcattcctt gcccacaggc ttgcactgga agctgaataa 4199 

gaatccccaa aactcaaact tcctagggat gccacccctt tagtagctca cacctccccc 4259 

ctccaagagc taagaaacaa aggagaatgt acttttgtag cttagataag caatgaatca 4319 

gtaaaggact gatctacttg ctccaccacc cctcccttaa taataacatt tactgttatt 4379 

tcctgggcct aagacttatg ttccagaact gtcacagctc cccatgtcac acccactagc 4439 

ttgtgatctt tgtcaaataa ctgaaatctt ttaagcctct agtttcttcc tttgtaaaac 4499 

agagataaaa tgttgtggtt tttaagtgag ataatccaag taaagcacct aacatggagt 455 9 

agtgaatgaa catcggttgc tactaaaagt ggacatccta ccgcatcctt aatgccacta 4619 

ggcatttcca tacaatctgg ggaccaaaac ttcaatcata taaatgtatg aggttaatta 4679 

aaaacactac tgtaatctgc ttgtatgatc acaaaccacc acaaaagaaa agatcgtgaa 4739 

gattacactg taaacggact ctcaaatgat caggaggtgg tcacttcgca acttgctccc 47 99 

tccacccaac tcaaaacagg agctcgagcc tgcctgtatt tgagactgga gctgcctgta 4859 

tgaggactgg atcaactgct agtcacgtta tatccaaatc tgcattatca ttgggcacat 4919 

tttcacagaa ttttactgaa ttattcctta attgtttaat ggttgggaat agtttgggaa 4979 

ttaccttcca tcaactctgc taagaaagga atggattctg gtagcaagac aatataattc 503 9 

tcctttagtt tttcagccag tgctaacaca gtaatcaaag cagcaaatcg aacctgaaag 5099 

ggataaaaga gcaaagaaat aaaaagtagt gttactgtat ttattatctt aagagctgta 515 9 

ctgacttgag acaagctcta actttttaaa cattagttca cacgcgttta ttcacttcat 5219 

tatgttcatt aagctttcat cttagaatac cagtttcacc atttgggagc tgtttgtaat 5279 

atgtgcaacc ttataaatag tgttttccaa actgtgtccc aggactgcaa atctttaatg 5339 

tgaaatgtct ttttataatc tcttccttta aaaaaaacca ataaaataaa atgccacatg 5399 

caaactc 5406 

<210> 7 

<211> 5532 

<212> DNA 

<213> Homo sapiens 

<220> 

<221> 5'UTR 
<222> 1 . . 198 

<220> 
<221> CDS 
<222> 199. .1275 

<220> 

<221> 3'UTR 
<222> 1276. .5532 

<220> 

<221> polyA_signal 
<222> 5510 . . 5515 

<400> 7 

ccgcccacgg acgccagagc cgggaaccct gacggcactt agctgctgac aaacaacctg 60 

ctccgtggag cgcctgaaac accagtcttt ggggccagtg cctcagtttc aatccaggta 120 

acctttaaat gaaacttgcc taaaatctta ggtcatacac agaagagact ccaatcgaca 180 

agaagctgga aaagaatg atg ttg tec tta aac aac eta cag aat ate ate 231 
Met Leu Ser Leu Asn Asn Leu Gin Asn lie lie 
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tat aac ccg gta ate 
Tyr Asn Pro Val lie 
15 

cct gga act ttg att 
Pro Gly Thr Leu lie 
30 

aga ttc cag gtg gat 
Arg Phe Gin Val Asp 
45 

gat gtg gec ttt cat 
Asp Val Ala Phe His 
60 

gtt tgc aat act ttg 
Val Cys Asn Thr Leu 
80 

tat gac acg cct ttc 
Tyr Asp Thr Pro Phe 
95 

gtg eta aag gac aaa 
Val Leu Lys Asp Lys 
110 

etc tat ggc cac agg 
Leu Tyr Gly His Arg 
125 

tat ggc aaa gtg aat 
Tyr Gly Lys Val Asn 
14 0 

tta caa agt acc caa 
Leu Gin Ser Thr Gin 
160 

gaa aat gtt cca aag 
Glu Asn Val Pro Lys 
175 

gga gac att tct aaa 
Gly Asp lie Ser Lys 
190 

gat teg act gtc aat 
Asp Ser Thr Val Asn 
205 

aac tat gtg tea aag 
Asn Tyr Val Ser Lys 
220 

atg ggc cct gga cga 
Met Gly Pro Gly Arg 
240 

gee aaa age ttt aat 
Ala Lys Ser Phe Asn 
255 

get eta cac ttg aac 
Ala Leu His Leu Asn 
270 

tct ttt ctt cag gag 
Ser Phe Leu Gin Glu 
285 

ttc cca ttt agt cct 
Phe Pro Phe Ser Pro 
300 

gtt aga gaa ttc aag 
Val Arg Glu Phe Lys 
320 

aaa cac aga ttt aaa 



1 

ccg tat gtt ggc 
Pro Tyr Val Gly 
20 

gtg ata tgt ggg 
Val He Cys Gly 
35 

ctg cag aat ggc 
Leu Gin Asn Gly 
50 

ttc aat cct cgt 
Phe Asn Pro Arg 
65 

ata aat gaa aaa 
He Asn Glu Lys 

aaa aga gaa aag 
Lys Arg Glu Lys 
100 

ttc cag gtg get 
Phe Gin Val Ala 
115 

ate ggc cca gag 
He Gly Pro Glu 
130 

att cac tea att 
He His Ser He 
145 

gca tct agt ctg 
Ala Ser Ser Leu 

tct ggc acg ccc 
Ser Gly Thr Pro 
180 

ate gca ccc aga 
He Ala Pro Arg 
195 

cac act ttg act 
His Thr Leu Thr 
210 

age ctg cca ttc 
Ser Leu Pro Phe 
225 

act gtc gtc gtt 
Thr Val Val Val 

gtt gac eta eta 
Val Asp Leu Leu 
260 

cca cgc ctg aat 
Pro Arg Leu Asn 
275 

tec tgg gga gaa 
Ser Trp Gly Glu 
290 

ggg atg tac ttt 
Gly Met Tyr Phe 
305 

gtt gca gta aat 
Val Ala Val Asn 

gag etc age agt 



acc att ccc gat cag 
Thr He Pro Asp Gin 
25 

cat gtt cct agt gac 
His Val Pro Ser Asp 
40 

age agt gtg aaa cct 
Ser Ser Val Lys Pro 
55 

ttc aaa agg gec ggc 
Phe Lys Arg Ala Gly 
70 

tgg gga egg gaa gag 
Trp Gly Arg Glu Glu 
85 

tct ttt gag ate gtg 
Ser Phe Glu He Val 
105 

gta aat gga aaa cat 
Val Asn Gly Lys His 
12 0 

aaa ata gac act ctg 
Lys He Asp Thr Leu 
135 

ggt ttt age ttc age 
Gly Phe Ser Phe Ser 
150 

gaa ctg aca gag ata 
Glu Leu Thr Glu He 
165 

cag ctt cct agt aat 
Gin Leu Pro Ser Asn 
185 

act gtc tac acc aag 
Thr Val Tyr Thr Lys 
200 

tgc acc aaa ata cca 
Cys Thr Lys lie Pro 
215 

get gca agg ttg aac 
Ala Ala Arg Leu Asn 
230 

aaa gga gaa gtg aat 
Lys Gly Glu Val Asn 
245 

gca gga aaa tea aag 
Ala Gly Lys Ser Lys 
265 

att aaa gca ttt gta 
He Lys Ala Phe Val 
280 

gaa gag aga aat att 
Glu Glu Arg Asn He 
295 

gag atg ata att tac 
Glu Met He He Tyr 
310 

ggc gta cac age ctg 
Gly Val His Ser Leu 
325 

att gac acg ctg gaa 



10 

ctg gat 279 
Leu Asp 

gca gac 327 
Ala Asp 

cga gec 375 
Arg Ala 

tgc att 423 
Cys He 
75 

ate acc 471 
He Thr 
90 

att atg 519 
He Met 

act ctg 567 
Thr Leu 

ggc att 615 
Gly He 

teg gac 663 
Ser Asp 
155 

agt aga 711 
Ser Arg 
170 

aga gga 759 
Arg Gly 

age aaa 8 07 

Ser Lys 

cct atg 855 
Pro Met 

acc ccc 903 
Thr Pro 
235 

gca aat 951 
Ala Asn 
250 

gat att 999 
Asp He 

aga aat 1047 
Arg Asn 

acc tct 1095 
Thr Ser 

tgt gat 114 3 

Cys Asp 
315 

gag tac 1191 
Glu Tyr 
330 

att aat 1239 
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Lys His Arg Phe Lys Glu Leu Ser Ser lie Asp Thr Leu Glu lie Asn 

335 340 345 

gga gac ate cac tta ctg gaa gta agg age tgg tag cctacctaca 1285 
Gly Asp lie His Leu Leu Glu Val Arg Ser Trp * 

350 355 

cagctgctac aaaaaccaaa atacagaatg gcttctgtga tactggcett getgaaaege 1345 

atctcactgt cattctattg tttatattgt taaaatgagc ttgtgcacca ttagatcctg 1405 

ctgggtgttc tcagtccttg ccatgaagta tggtggtgtc tagcactgaa tggggaaact 14 65 

gggggcagca acacttatag ccagttaaag ccactctgcc ctctctccta ctttggctga 1525 

ctcttcaaga atgecattea acaagtattt atggagtacc tactataata cagtagctaa 1585 

catgtattga gcacagattt tttttggtaa aactgtgagg agctaggata tatacttggt 1645 

gaaacaaacc agtatgttcc ctgttctctt gagcttcgac tcttctgtgc tetattgetg 1705 

cgcactgctt tttctacagg cattacatca actcctaagg ggtcctctgg gattagttaa 1765 

gcagctatta aatcacccga agacactaat ttacagaaga cacaactcct tccccagtga 1825 

tcactgtcat aaccagtget ctaccgtatc ccatcactga ggactgatgt tgactgacat 1885 

cattttatcg taataaacat gtggctctat tagctgeaag ctttaccaag taattggcat 194 5 

gacatctgag cacagaaatt aaggcaaaaa accaaagcaa aacaaataca tggtgctgaa 2 0 05 

attaacttga tgccaagccc aaggcagctg atttctgtgt atttgaactt agggcaaatc 2065 

agagtctaca cagacgccta cagaaagttt caggaagagg caagatgeat tcaatttgaa 2125 

agatatttat gggcaacaaa gtaaggtcag gattagactt caggcattca taaggcaggc 2185 

actatcagaa agtgtacgcc aactaaggga cccacaaagc aggcagaggt aatgcagaaa 2245 

tctgttttgt tcccatgaaa tcaccaatca aggcctccgt tcttctaaag attagtccat 23 05 

catcattagc aactgagatc aaagcactct tccactttac gtgattaaaa tcaaacctgt 2365 

atcagcaagt taaatggttc catttctgtg atttttctat tatttgaggg gagttggcag 2425 

aagttccatg tatatgggat ctttacaggt cagatcttgt tacaggaaat ttcaaaggtt 2485 

tgggagtggg gagggaaaaa agctcagtca gtgaggatca ttttatcaca ttagactggg 2545 

gcagaactct gecaggattt aggaatattt tcagaacaga ttttagatat tatttctatc 2605 

catatattga aaagaatacc attgtcaatc ttattttttt aaaagtactc agtgtagaaa 2665 

ttgctagccc ttaattcttt tccagctttt catattaatg tatgeagagt ctcaccaagc 2725 

tcaaagacac tggttggggg tggagggtgc cacagggaaa gctgtagaag gcaagaagac 27 85 

tcgagaatcc cccagagtta tttttctcca taaagaccat cagagtgett aactgagctg 2 845 

ttggagactg tgaggcattt aggaaaaaaa tagcccactc acatcattcc ttgtaagtct 2 905 

taagttcatt ttcattttac gtggaggaaa aaaatttaaa aagctattag tatttattaa 2 965 

tgaattttac tgagacattt cttagaaata tgcacttcta tactagcaag ctctgtctct 3025 

aaaatgcaag ttggcctttt gcttgccaca tttctgeatt aaacttctat attagcttca 3085 

aaggctttta aactcaatgc gaacattcta cgggatgttc ttagatgect ttaaaaaggg 3145 

ggcagatcta attttatttg aaccctcact ttccaacttc accatgaccc agtactagag 3205 

attagggcac ttcaaagcat tgaaaaaaat ctactgatac ttactttctt agacaagtag 3265 

ttcttagtta accaccaatg gaactgggtt cattctgaat cctggaggag cttcctcgtg 3325 

ccacccagtg tttctgggcc ctctgtgtga geagecaggt atgagctgtt ttagaagcag 3 385 

cgtgttgcct tcatctctcc cgtttcccaa aagaacaaag gataaaggtg acagtcacac 3445 

tcctgggtta aaaaaagcat tccagaacca cttctcttta tgggcacaac aaagaaacga 3505 

aggctgaagt tcgcctaccc aaaatgaaaa gtaggcttta cagtcaaaag tacttctgtt 3565 

gattgetaaa taacttcatt ttcttgaaat agagcaactt tgagtgaaat ctgcaacatg 3625 

gataccatgt atataagata ctgctgtaca gaagagttaa ggcttacagt gcaaatgagg 3685 

cgtcagcttt gggtgctaaa attaacaagt ctaatattat taccatcaat caggaagaga 3745 

ataataaatg tttaaacaaa cacagcagtc tgtataaaaa taccgtgtat catttactct 3 805 

ttctgeaget ctatacgata ggcaggagag gcttatgtgg cagcacaagc caggtgggga 3 8 65 

ttttgtaacg aagtgataaa acatttgtaa gtaatccaag taggtgtatt aaggcaccaa 3925 

aagtaacatg gcacccaaca cccaaaaata aaaatatgaa atatgagtgt gaactctgag 3 985 

tagagtatga aacaccacag aaagtcttag aaatagctct ggagtggctc tcccaggaca 4045 

gtttccagtt gctgaatagt cttttggcac tgatgttcta cttcttcaca ttcatctaaa 4105 

aaaaaaaaaa aaaaaaatca aaattaaaat ctgagtcagt ctgcctgcct eggttctcat 4165 

tagtttaatt ettaatgect tgcactttcc agcaatcatt caatcaaaag agtgaaatga 4225 

agcacattaa caaagcagga ggcgccacgg accgcctccc tccacaccgc tccttccgcc 4285 

ttcattcctt gcccacaggc ttgcactgga agctgaataa gaatccccaa aactcaaact 4345 

tcctagggat gccacccctt tagtagctca cacctccccc ctccaagagc taagaaacaa 44 05 

aggagaatgt acttttgtag cttagataag caatgaatca gtaaaggact gatctacttg 4465 

ctccaccacc cctcccttaa taataacatt tactgttatt tcctgggcct aagacttatg 4525 

ttccagaact gtcacagctc cccatgtcac acccactagc ttgtgatctt tgtcaaataa 4585 

ctgaaatctt ttaagectet agtttcttcc tttgtaaaac agagataaaa tgttgtggtt 4645 

tttaagtgag ataatccaag taaagcacct aacatggagt agtgaatgaa catcggttgc 4705 
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tactaaaagt ggacatccta ccgcatcctt aatgccacta ggcatttcca tacaatctgg 4765 

ggaccaaaac ttcaatcata taaatgtatg aggttaatta aaaacactac tgtaatctgc 4825 

ttgtatgatc acaaaccacc acaaaagaaa agatcgtgaa gattacactg taaacggact 4885 

ctcaaatgat caggaggtgg tcacttcgca acttgctccc tccacccaac tcaaaacagg 4945 

agctcgagcc tgcctgtatt tgagactgga gctgcctgta tgaggactgg atcaactgct 5005 

agtcacgtta tatccaaatc tgcattatca ttgggcacat tttcacagaa ttttactgaa 5065 

ttattcctta attgtttaat ggttgggaat agtttgggaa ttaccttcca tcaactctgc 5125 

taagaaagga atggattctg gtagcaagac aatataattc tcctttagtt tttcagccag 5185 

tgctaacaca gtaatcaaag cagcaaatcg aacctgaaag ggataaaaga gcaaagaaat 5245 

aaaaagtagt gttactgtat ttattatctt aagagctgta ctgacttgag acaagctcta 5305 

actttttaaa cattagttca cacgcgttta ttcacttcat tatgttcatt aagctttcat 5365 

cttagaatac cagtttcacc atttgggagc tgtttgtaat atgtgcaacc ttataaatag 5425 

tgttttccaa actgtgtccc aggactgcaa atctttaatg tgaaatgtct ttttataatc 5485 

tcttccttta aaaaaaacca ataaaataaa atgccacatg caaactc 5532 



<210> 8 

<211> 2469 

<212> DNA 

<213> Homo sapiens 



<220> 

<221> 5'UTR 
<222> 1. . 198 



<220> 
<221> CDS 
<222> 199. .1305 



<220> 

<221> 3'UTR 
<222> 1306 . .2469 



<400> 8 

ccgcccacgg acgccagagc cgggaaccct gacggcactt agctgctgac aaacaacctg 6 0 

ctccgtggag cgcctgaaac accagtcttt ggggccagtg cctcagtttc aatccaggta 12 0 
acctttaaat gaaacttgcc taaaatctta ggtcatacac agaagagact ccaatcgaca 180 
agaagctgga aaagaatg atg ttg tec tta aac aac eta cag aat ate ate 231 
Met Leu Ser Leu Asn Asn Leu Gin Asn lie lie 
15 10 
tat aac ccg gta ate ccg tat gtt ggc acc att ccc gat cag ctg gat 279 
Tyr Asn Pro Val lie Pro Tyr Val Gly Thr lie Pro Asp Gin Leu Asp 

15 20 25 

cct gga act ttg att gtg ata tgt ggg cat gtt cct agt gac gca gac 32 7 

Pro Gly Thr Leu lie Val lie Cys Gly His Val Pro Ser Asp Ala Asp 

30 35 40 

aga ttc cag gtg gat ctg cag aat ggc age agt gtg aaa cct cga gec 37 5 

Arg Phe Gin Val Asp Leu Gin Asn Gly Ser Ser Val Lys Pro Arg Ala 

45 50 55 

gat gtg gec ttt cat ttc aat cct cgt ttc aaa agg gec ggc tgc att 42 3 

Asp Val Ala Phe His Phe Asn Pro Arg Phe Lys Arg Ala Gly Cys lie 
60 65 70 75 

gtt tgc aat act ttg ata aat gaa aaa tgg gga egg gaa gag ate acc 471 
Val Cys Asn Thr Leu lie Asn Glu Lys Trp Gly Arg Glu Glu lie Thr 

80 85 90 

tat gac acg cct ttc aaa aga gaa aag tct ttt gag ate gtg att atg 519 
Tyr Asp Thr Pro Phe Lys Arg Glu Lys Ser Phe Glu lie Val lie Met 

95 100 105 

gtg eta aag gac aaa ttc cag gtg get gta aat gga aaa cat act ctg 567 
Val Leu Lys Asp Lys Phe Gin Val Ala Val Asn Gly Lys His Thr Leu 

110 115 120 

etc tat ggc cac agg ate ggc cca gag aaa ata gac act ctg ggc att 615 
Leu Tyr Gly His Arg lie Gly Pro Glu Lys lie Asp Thr Leu Gly lie 
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125 130 135 

tat ggc aaa gtg aat att cac tea att ggt ttt age ttc age teg gac 663 
Tyr Gly Lys Val Asn He His Ser He Gly Phe Ser Phe Ser Ser Asp 
140 145 150 155 

tta caa agt acc caa gca tct agt ctg gaa ctg aca gag ata agt aga 711 
Leu Gin Ser Thr Gin Ala Ser Ser Leu Glu Leu Thr Glu He Ser Arg 

160 165 170 

gaa aat gtt cca aag tct ggc acg ccc cag ctt age ctg cca ttc get 759 
Glu Asn Val Pro Lys Ser Gly Thr Pro Gin Leu Ser Leu Pro Phe Ala 

175 180 185 

gca agg ttg aac acc ccc atg ggc cct gga cga act gtc gtc gtt aaa 807 
Ala Arg Leu Asn Thr Pro Met Gly Pro Gly Arg Thr Val Val Val Lys 

190 195 200 

gga gaa gtg aat gca aat gec aaa age ttt aat gtt gac eta eta gca 855 
Gly Glu Val Asn Ala Asn Ala Lys Ser Phe Asn Val Asp Leu Leu Ala 

205 210 215 

gga aaa tea aag gat att get eta cac ttg aac cca cgc ctg aat att 903 
Gly Lys Ser Lys Asp He Ala Leu His Leu Asn Pro Arg Leu Asn He 
220 225 230 235 

aaa gca ttt gta aga aat tct ttt ctt cag gag tec tgg gga gaa gaa 951 
Lys Ala Phe Val Arg Asn Ser Phe Leu Gin Glu Ser Trp Gly Glu Glu 

240 245 250 

gag aga aat att acc tct ttc cca ttt agt cct ggg atg tac ttt gag 999 
Glu Arg Asn He Thr Ser Phe Pro Phe Ser Pro Gly Met Tyr Phe Glu 

255 260 265 

atg ata att tac tgt gat gtt aga gaa ttc aag gtt gca gta aat ggc 1047 
Met He He Tyr Cys Asp Val Arg Glu Phe Lys Val Ala Val Asn Gly 

270 275 280 

gta cac age ctg gag tac aaa cac aga ttt aaa gag etc age agt att 1095 
Val His Ser Leu Glu Tyr Lys His Arg Phe Lys Glu Leu Ser Ser He 

285 290 295 

gac acg ctg gaa att aat gga gac ate cac tta ctg gaa caa tea ttc 1143 
Asp Thr Leu Glu He Asn Gly Asp lie His Leu Leu Glu Gin Ser Phe 
300 305 310 315 

aat caa aag agt gaa atg aag cac att aac aaa gca gga ggc gec acg 1191 
Asn Gin Lys Ser Glu Met Lys His He Asn Lys Ala Gly Gly Ala Thr 

320 325 330 

gac cgc etc cct cca cac cgc tec ttc cgc ctt cat tec ttg ccc aca 1239 
Asp Arg Leu Pro Pro His Arg Ser Phe Arg Leu His Ser Leu Pro Thr 

335 340 345 

ggc ttg cac tgg aag ctg aat aag aat ccc caa aac tea aac ttc eta 12 87 

Gly Leu His Trp Lys Leu Asn Lys Asn Pro Gin Asn Ser Asn Phe Leu 

350 355 360 

ggg atg cca ccc ctt tag tagctcacac ctcccccctc caagagctaa 1335 
Gly Met Pro Pro Leu * 
365 

gaaacaaagg agaatgtact tttgtagctt agataagcaa tgaatcagta aaggactgat 13 95 
ctacttgctc caccacccct cccttaataa taacatttac tgttatttcc tgggcctaag 1455 
acttatgttc cagaactgtc acagctcccc atgtcacacc cactagcttg tgatctttgt 1515 
caaataactg aaatctttta agectctagt ttcttccttt gtaaaacaga gataaaatgt 1575 
tgtggttttt aagtgagata atccaagtaa agcacctaac atggagtagt gaatgaacat 1635 
eggttgetae taaaagtgga catcctaccg catccttaat gccactaggc atttccatac 1695 
aatctgggga ccaaaacttc aatcatataa atgtatgagg ttaattaaaa acactactgt 1755 
aatctgcttg tatgatcaca aaccaccaca aaagaaaaga tegtgaagat tacactgtaa 1815 
acggactctc aaatgatcag gaggtggtca cttcgcaact tgctccctcc acccaactca 1875 
aaacaggagc tcgagcctgc ctgtatttga gactggagct gcctgtatga ggactggatc 1935 
aactgetagt caegttatat ccaaatctgc attatcattg ggcacatttt cacagaattt 1995 
tactgaatta ttccttaatt gtttaatggt tgggaatagt ttgggaatta ccttccatca 2055 
actctgetaa gaaaggaatg gattctggta gcaagacaat ataattctcc tttagttttt 2115 
cagccagtgc taacacagta atcaaagcag caaatcgaac ctgaaaggga taaaagagca 2175 
aagaaataaa aagtagtgtt actgtattta ttatcttaag agctgtactg acttgagaca 2235 
agctctaact ttttaaacat tagttcacac gcgtttattc acttcattat gttcattaag 22 95 
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ctttcatctt agaataccag tttcaccatt tgggagctgt ttgtaatatg tgcaacctta 2355 
taaatagtgt tttccaaact gtgtcccagg actgcaaatc tttaatgtga aatgtctttt 2415 
tataatctct tcctttaaaa aaaaccaata aaataaaatg ccacatgcaa actc 2469 

<210> 9 

<211> 281 

<212> DNA 

<213> Homo sapiens 

<400> 9 

atatcacctg taaagtcttc tggccaaaaa attaagccca gccggacctt attaaacctt 60 

taaatctaac aattagtttt gaagctttta cagattaaat gaagtctgag atttgcttca 120 

aaatgaacca gtggtgggga ggaagtgggt gaggtgtagg tgaaacaaga ttggccacgt 18 0 

cgataattgc tggagctggg cgatgaaagc acagttctag aagcttgttt ctcccacctg 24 0 

aaaagactgg atttgggaca tgatcctgta gaacttcgga g 281 

<210> 10 

<211> 515 

<212> DNA 

<213> Homo sapiens 

<220> 

<221> 5'UTR 
<222> 1. .384 

<220> 
<221> CDS 
<222> 385 . .515 

<400> 10 

tggacttgga tccgaggcag acgaggaagc tgagaaaacc ctggcgttga ccccgtggac 6 0 

ctgggcgccc cgggaaggcc agcgcttggt ccaggcaggc ggggcctgtg cggtgaccac 12 0 
cctggtcctg aaaagtccca gccccgagcg ccctccctcc tagacctgga ggcctggaac 18 0 
agccagccgc ccacggacgc cagagccggg aaccctgacg gcacttagct gctgacaaac 24 0 
aacctgctcc gtggagcgcc tgaaacacca gtctttgggg ccagtgcctc agtttcaatc 300 
caggtaacct ttaaatgaaa cttgcctaaa atcttaggtc atacacagaa gagactccaa 360 
tcgacaagaa gctggaaaag aatg atg ttg tec tta aac aac eta cag aat 411 

Met Leu Ser Leu Asn Asn Leu Gin Asn 

1 5 

ate ate tat aac ccg gta ate ccg tat gtt ggc ace att ccc gat cag 459 

lie He Tyr Asn Pro Val He Pro Tyr Val Gly Thr He Pro Asp Gin 

10 15 20 25 

ctg gat cct gga act ttg att gtg ata tgt ggg cat gtt cct agt gac 507 

Leu Asp Pro Gly Thr Leu He Val He Cys Gly His Val Pro Ser Asp 

30 35 40 

gca gac ag 515 
Ala Asp 

<210> 11 

<211> 304 

<212> DNA 

<213> Homo sapiens 

<220> 

<221> 5'UTR 
<222> 1. . 173 

<220> 
<221> CDS 
<222> 174 . .304 

<400> 11 
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ttctagaagc ttgtttctcc cacctgaaaa gactggattt gggacatgat cctgtagaac 60 
ttcggagggc cagtgcctca gtttcaatcc aggtaacctt taaatgaaac ttgcctaaaa 12 0 
tcttaggtca tacacagaag agactccaat cgacaagaag ctggaaaaga atg atg 176 

Met 
1 

ttg tec tta aac aac eta cag aat ate ate tat aac ccg gta ate ccg 224 
Leu Ser Leu Asn Asn Leu Gin Asn lie He Tyr Asn Pro Val He Pro 

5 10 15 

tat gtt ggc acc att ccc gat cag ctg gat cct gga act ttg att gtg 272 
Tyr Val Gly Thr He Pro Asp Gin Leu Asp Pro Gly Thr Leu He Val 

20 25 30 

ata tgt ggg cat gtt cct agt gac gca gac ag 3 04 

He Cys Gly His Val Pro Ser Asp Ala Asp 
35 40 

<210> 12 

<211> 473 

<212> DNA 

<213> Homo sapiens 

<220> 

<221> 5'UTR 
<222> 1. .342 

<220> 
<221> CDS 
<222> 343 . .473 

<400> 12 

ttctagaagc ttgtttctcc cacctgaaaa gactggattt gggacatgat cctgtagaac 60 

tteggagatt ggggaagata ateggaagag gtaaaagaca ccgtccatga cacttcctgg 12 0 

ggaagcagat gtatgtataa ggatccgccc acggacgcca gagcegggaa ccctgacggc 18 0 

acttagctgc tgacaaacaa cctgctccgt ggagcgcctg aaacaccagt etttggggee 24 0 

agtgcctcag tttcaatcca ggtaaccttt aaatgaaact tgcctaaaat cttaggtcat 300 

acacagaaga gactccaatc gacaagaagc tggaaaagaa tg atg ttg tec tta 354 

Met Leu Ser Leu 

1 

aac aac eta cag aat ate ate tat aac ccg gta ate ccg tat gtt ggc 402 

Asn Asn Leu Gin Asn He He Tyr Asn Pro Val He Pro Tyr Val Gly 

5 10 15 20 

acc att ccc gat cag ctg gat cct gga act ttg att gtg ata tgt ggg 450 

Thr lie Pro Asp Gin Leu Asp Pro Gly Thr Leu He Val lie Cys Gly 

25 30 35 

cat gtt cct agt gac gca gac ag 4 73 
His Val Pro Ser Asp Ala Asp 



<210> 13 

<211> 2077 

<212> DNA 

<213> Homo sapiens 

<220> 

<221> 5'UTR 
<222> 1. .265 

<220> 
<221> CDS 
<222> 266 . . 913 

<220> 

<221> 3'UTR 
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<222> 914 . .2077 



<400> 13 

ttctagaagc ttgtttctcc cacctgaaaa gactggattt gggacatgat cctgtagaac 6 0 

ttcggagccg cccacggacg ccagagccgg gaaccctgac ggcacttagc tgctgacaaa 12 0 
caacctgctc cgtggagcgc ctgaaacacc agtctttggg gccagtgcct cagtttcaat 180 
ccaggtaacc tttaaatgaa acttgcctaa aatcttaggt catacacaga agagactcca 240 
atcgacaaga agctggaaaa gaatg atg ttg tec tta aac aac eta cag aat 2 92 

Met Leu Ser Leu Asn Asn Leu Gin Asn 

1 5 

ate ate tat aac ccg gta ate ccg tat gtt ggc ace att ccc gat cag 340 

lie lie Tyr Asn Pro Val lie Pro Tyr Val Gly Thr lie Pro Asp Gin 

10 15 20 25 

ctg gat cct gga act ttg att gtg ata tgt ggg cat gtt cct agt gac 3 88 

Leu Asp Pro Gly Thr Leu He Val He Cys Gly His Val Pro Ser Asp 

30 35 40 

gca gac aga ttc cag gtg gat ctg cag aat ggc age agt gtg aaa cct 436 
Ala Asp Arg Phe Gin Val Asp Leu Gin Asn Gly Ser Ser Val Lys Pro 

45 50 55 

cga gee gat gtg gec ttt cat ttc aat cct cgt ttc aaa agg gec ggc 484 
Arg Ala Asp Val Ala Phe His Phe Asn Pro Arg Phe Lys Arg Ala Gly 

60 65 70 

tgc att gtt tgc aat act ttg ata aat gaa aaa tgg gga egg gaa gag 532 
Cys He Val Cys Asn Thr Leu He Asn Glu Lys Trp Gly Arg Glu Glu 

75 80 85 

ate acc tat gac acg cct ttc aaa aga gaa aag tct ttt gag ate gtg 58 0 

He Thr Tyr Asp Thr Pro Phe Lys Arg Glu Lys Ser Phe Glu He Val 
90 95 100 105 

att atg gtg eta aag gac aaa ttc cag atg ata att tac tgt gat gtt 62 8 

He Met Val Leu Lys Asp Lys Phe Gin Met He He Tyr Cys Asp Val 

110 115 120 

aga gaa ttc aag gtt gca gta aat ggc gta cac age ctg gag tac aaa 67 6 

Arg Glu Phe Lys Val Ala Val Asn Gly Val His Ser Leu Glu Tyr Lys 

125 130 135 

cac aga ttt aaa gag etc age agt att gac acg ctg gaa att aat gga 724 
His Arg Phe Lys Glu Leu Ser Ser He Asp Thr Leu Glu He Asn Gly 

140 145 150 

gac ate cac tta ctg gaa caa tea ttc aat caa aag agt gaa atg aag 772 
Asp He His Leu Leu Glu Gin Ser Phe Asn Gin Lys Ser Glu Met Lys 

155 160 165 

cac att aac aaa gca gga ggc gec acg gac cgc etc cct cca cac cgc 820 
His He Asn Lys Ala Gly Gly Ala Thr Asp Arg Leu Pro Pro His Arg 
170 175 180 185 

tec ttc cgc ctt cat tec ttg ccc aca ggc ttg cac tgg aag ctg aat 868 
Ser Phe Arg Leu His Ser Leu Pro Thr Gly Leu His Trp Lys Leu Asn 

190 195 200 

aag aat ccc caa aac tea aac ttc eta ggg atg cca ccc ctt tag 913 
Lys Asn Pro Gin Asn Ser Asn Phe Leu Gly Met Pro Pro Leu * 

205 210 215 

tagctcacac ctcccccctc caagagctaa gaaacaaagg agaatgtact tttgtagctt 973 
agataagcaa tgaatcagta aaggactgat ctacttgctc caccacccct cccttaataa 1033 
taacatttac tgttatttcc tgggcctaag acttatgttc cagaactgtc acagctcccc 1093 
atgtcacacc cactagcttg tgatctttgt caaataactg aaatctttta agectctagt 1153 
ttcttccttt gtaaaacaga gataaaatgt tgtggttttt aagtgagata atccaagtaa 1213 
agcacctaac atggagtagt gaatgaacat eggttgetae taaaagtgga catcctaccg 1273 
catccttaat gccactaggc atttccatac aatctgggga ccaaaacttc aatcatataa 1333 
atgtatgagg ttaattaaaa acactactgt aatctgcttg tatgatcaca aaccaccaca 1393 
aaagaaaaga tegtgaagat tacactgtaa acggactctc aaatgatcag gaggtggtca 1453 
cttcgcaact tgctccctcc acccaactca aaacaggagc tcgagcctgc ctgtatttga 1513 
gactggagct gcctgtatga ggactggatc aactgetagt caegttatat ccaaatctgc 1573 
attatcattg ggcacatttt cacagaattt tactgaatta ttccttaatt gtttaatggt 1633 
tgggaatagt ttgggaatta ccttccatca actctgetaa gaaaggaatg gattctggta 1693 



gcaagacaat ataattctcc tttagttttt cagccagtgc taacacagta atcaaagcag 1753 

caaatcgaac ctgaaaggga taaaagagca aagaaataaa aagtagtgtt actgtattta 1813 

ttatcttaag agctgtactg acttgagaca agctctaact ttttaaacat tagttcacac 1873 

gcgtttattc acttcattat gttcattaag ctttcatctt agaataccag tttcaccatt 1933 

tgggagctgt ttgtaatatg tgcaacctta taaatagtgt tttccaaact gtgtcccagg 1993 
actgcaaatc tttaatgtga aatgtctttt tataatctct tcctttaaaa aaaaccaata 
aaataaaatg ccacatgcaa actc 



2053 
2077 



<210> 14 
<211> 316 
<212> PRT 

<213> Homo sapiens 



<400> 14 

Met Leu Ser Leu 

1 

Pro Tyr Val Gly 
20 

Val lie Cys Gly 
35 

Leu Gin Asn Gly 
50 

Phe Asn Pro Arg 
65 

lie Asn Glu Lys 

Lys Arg Glu Lys 
100 

Phe Gin Val Ala 
115 

lie Gly Pro Glu 
130 

lie His Ser lie 
145 

Ala Ser Ser Leu 

Ser Gly Thr Pro 
180 

Pro Met Gly Pro 
195 

Asn Ala Lys Ser 
210 

lie Ala Leu His 
225 

Asn Ser Phe Leu 

Ser Phe Pro Phe 
260 

Asp Val Arg Glu 
275 

Tyr Lys His Arg 
290 

Asn Gly Asp lie 
305 



Asn Asn Leu Gin 
5 

Thr lie Pro Asp 

His Val Pro Ser 
40 

Ser Ser Val Lys 
55 

Phe Lys Arg Ala 
70 

Trp Gly Arg Glu 
85 

Ser Phe Glu lie 

Val Asn Gly Lys 
120 

Lys lie Asp Thr 
135 

Gly Phe Ser Phe 
150 

Glu Leu Thr Glu 
165 

Gin Leu Ser Leu 

Gly Arg Thr Val 
200 

Phe Asn Val Asp 
215 

Leu Asn Pro Arg 
230 

Gin Glu Ser Trp 
245 

Ser Pro Gly Met 

Phe Lys Val Ala 
280 

Phe Lys Glu Leu 
295 

His Leu Leu Glu 
310 



Asn lie lie Tyr 
10 

Gin Leu Asp Pro 
25 

Asp Ala Asp Arg 

Pro Arg Ala Asp 
60 

Gly Cys lie Val 
75 

Glu lie Thr Tyr 
90 

Val lie Met Val 
105 

His Thr Leu Leu 

Leu Gly lie Tyr 
140 

Ser Ser Asp Leu 
155 

lie Ser Arg Glu 
170 

Pro Phe Ala Ala 
185 

Val Val Lys Gly 

Leu Leu Ala Gly 
220 

Leu Asn lie Lys 
235 

Gly Glu Glu Glu 
250 

Tyr Phe Glu Met 
265 

Val Asn Gly Val 

Ser Ser lie Asp 
300 

Val Arg Ser Trp 
315 



Asn Pro Val lie 
15 

Gly Thr Leu lie 
30 

Phe Gin Val Asp 
45 

Val Ala Phe His 

Cys Asn Thr Leu 
80 

Asp Thr Pro Phe 
95 

Leu Lys Asp Lys 
110 

Tyr Gly His Arg 
125 

Gly Lys Val Asn 

Gin Ser Thr Gin 
160 

Asn Val Pro Lys 
175 

Arg Leu Asn Thr 
190 

Glu Val Asn Ala 
205 

Lys Ser Lys Asp 

Ala Phe Val Arg 
240 

Arg Asn lie Thr 
255 

lie lie Tyr Cys 
270 

His Ser Leu Glu 
285 

Thr Leu Glu lie 



<210> 15 

<211> 358 

<212> PRT 

<213> Homo sapiens 



<400> 15 

Met Leu Ser Leu Asn Asn Leu Gin Asn lie lie Tyr Asn Pro Val lie 



81 



Pro Tyr Val Gly 
20 

Val lie Cys Gly 
35 

Leu Gin Asn Gly 
50 

Phe Asn Pro Arg 
65 

lie Asn Glu Lys 

Lys Arg Glu Lys 
100 

Phe Gin Val Ala 
115 

lie Gly Pro Glu 
130 

lie His Ser lie 
145 

Ala Ser Ser Leu 

Ser Gly Thr Pro 
180 

lie Ala Pro Arg 
195 

His Thr Leu Thr 
210 

Ser Leu Pro Phe 
225 

Thr Val Val Val 

Val Asp Leu Leu 
260 

Pro Arg Leu Asn 
275 

Ser Trp Gly Glu 
290 

Gly Met Tyr Phe 
305 

Val Ala Val Asn 

Glu Leu Ser Ser 
340 

Leu Glu Val Arg 
355 



Thr lie Pro Asp 

His Val Pro Ser 
40 

Ser Ser Val Lys 
55 

Phe Lys Arg Ala 
70 

Trp Gly Arg Glu 
85 

Ser Phe Glu lie 

Val Asn Gly Lys 
120 

Lys lie Asp Thr 
135 

Gly Phe Ser Phe 
150 

Glu Leu Thr Glu 
165 

Gin Leu Pro Ser 

Thr Val Tyr Thr 
200 

Cys Thr Lys lie 
215 

Ala Ala Arg Leu 
230 

Lys Gly Glu Val 
245 

Ala Gly Lys Ser 

lie Lys Ala Phe 
280 

Glu Glu Arg Asn 
295 

Glu Met lie lie 
310 

Gly Val His Ser 
325 

lie Asp Thr Leu 
Ser Trp 



10 

Gin Leu Asp Pro 
25 

Asp Ala Asp Arg 

Pro Arg Ala Asp 
60 

Gly Cys lie Val 
75 

Glu lie Thr Tyr 
90 

Val lie Met Val 
105 

His Thr Leu Leu 

Leu Gly lie Tyr 
140 

Ser Ser Asp Leu 
155 

lie Ser Arg Glu 
170 

Asn Arg Gly Gly 
185 

Lys Ser Lys Asp 

Pro Pro Met Asn 
220 

Asn Thr Pro Met 
235 

Asn Ala Asn Ala 
250 

Lys Asp lie Ala 
265 

Val Arg Asn Ser 

lie Thr Ser Phe 
300 

Tyr Cys Asp Val 
315 

Leu Glu Tyr Lys 
330 

Glu lie Asn Gly 
345 



15 

Gly Thr Leu lie 
30 

Phe Gin Val Asp 
45 

Val Ala Phe His 

Cys Asn Thr Leu 
80 

Asp Thr Pro Phe 
95 

Leu Lys Asp Lys 
110 

Tyr Gly His Arg 
125 

Gly Lys Val Asn 

Gin Ser Thr Gin 
160 

Asn Val Pro Lys 
175 

Asp lie Ser Lys 
190 

Ser Thr Val Asn 
205 

Tyr Val Ser Lys 

Gly Pro Gly Arg 
240 

Lys Ser Phe Asn 
255 

Leu His Leu Asn 
270 

Phe Leu Gin Glu 
285 

Pro Phe Ser Pro 

Arg Glu Phe Lys 
320 

His Arg Phe Lys 
335 

Asp lie His Leu 
350 



<210> 16 

<211> 368 

<212> PRT 

<213> Homo sapiens 



<400> 16 

Met Leu Ser Leu Asn Asn Leu Gin 
1 5 
Pro Tyr Val Gly Thr lie Pro Asp 
20 

Val lie Cys Gly His Val Pro Ser 

35 40 
Leu Gin Asn Gly Ser Ser Val Lys 

50 55 
Phe Asn Pro Arg Phe Lys Arg Ala 
65 70 



Asn lie lie Tyr Asn Pro Val lie 

10 15 
Gin Leu Asp Pro Gly Thr Leu lie 
25 30 
Asp Ala Asp Arg Phe Gin Val Asp 
45 

Pro Arg Ala Asp Val Ala Phe His 
60 

Gly Cys lie Val Cys Asn Thr Leu 
75 80 
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lie Asn Glu Lys 

Lys Arg Glu Lys 
100 

Phe Gin Val Ala 
115 

lie Gly Pro Glu 
130 

lie His Ser lie 
145 

Ala Ser Ser Leu 

Ser Gly Thr Pro 
180 

Pro Met Gly Pro 
195 

Asn Ala Lys Ser 
210 

lie Ala Leu His 
225 

Asn Ser Phe Leu 

Ser Phe Pro Phe 
260 

Asp Val Arg Glu 
275 

Tyr Lys His Arg 
290 

Asn Gly Asp lie 
305 

Met Lys His He 

His Arg Ser Phe 
340 

Leu Asn Lys Asn 
355 



Trp Gly Arg Glu 
85 

Ser Phe Glu He 

Val Asn Gly Lys 
120 

Lys He Asp Thr 
135 

Gly Phe Ser Phe 
150 

Glu Leu Thr Glu 
165 

Gin Leu Ser Leu 

Gly Arg Thr Val 
200 

Phe Asn Val Asp 
215 

Leu Asn Pro Arg 
230 

Gin Glu Ser Trp 
245 

Ser Pro Gly Met 

Phe Lys Val Ala 
280 

Phe Lys Glu Leu 
295 

His Leu Leu Glu 
310 

Asn Lys Ala Gly 
325 

Arg Leu His Ser 

Pro Gin Asn Ser 
360 



Glu He Thr Tyr 
90 

Val He Met Val 
105 

His Thr Leu Leu 

Leu Gly He Tyr 
140 

Ser Ser Asp Leu 
155 

He Ser Arg Glu 
170 

Pro Phe Ala Ala 
185 

Val Val Lys Gly 

Leu Leu Ala Gly 
220 

Leu Asn He Lys 
235 

Gly Glu Glu Glu 
250 

Tyr Phe Glu Met 
265 

Val Asn Gly Val 

Ser Ser He Asp 
300 

Gin Ser Phe Asn 
315 

Gly Ala Thr Asp 
330 

Leu Pro Thr Gly 
345 

Asn Phe Leu Gly 



Asp Thr Pro Phe 
95 

Leu Lys Asp Lys 
110 

Tyr Gly His Arg 
125 

Gly Lys Val Asn 

Gin Ser Thr Gin 
160 

Asn Val Pro Lys 
175 

Arg Leu Asn Thr 
190 

Glu Val Asn Ala 
205 

Lys Ser Lys Asp 

Ala Phe Val Arg 
240 

Arg Asn He Thr 
255 

He He Tyr Cys 
270 

His Ser Leu Glu 
285 

Thr Leu Glu He 

Gin Lys Ser Glu 
320 

Arg Leu Pro Pro 
335 

Leu His Trp Lys 
350 

Met Pro Pro Leu 
365 



<210> 17 
<211> 215 
<212> PRT 
<213> Homo sapie: 



<400> 17 




























Met 


Leu 


Ser 


Leu 


Asn 


Asn 


Leu 


Gin 


Asn 


He 


He 


Tyr 


Asn 


Pro 


Val 


He 


1 








5 










10 










15 




Pro 


Tyr 


Val 


Gly 


Thr 


He 


Pro 


Asp 


Gin 


Leu 


Asp 


Pro 


Gly 


Thr 


Leu 


He 








20 










25 










30 






Val 


He 


Cys 


Gly 


His 


Val 


Pro 


Ser 


Asp 


Ala 


Asp 


Arg 


Phe 


Gin 


Val 


Asp 






35 










40 










45 








Leu 


Gin 


Asn 


Gly 


Ser 


Ser 


Val 


Lys 


Pro 


Arg 


Ala 


Asp 


Val 


Ala 


Phe 


His 




50 










55 










60 










Phe 


Asn 


Pro 


Arg 


Phe 


Lys 


Arg 


Ala 


Gly 


Cys 


He 


Val 


Cys 


Asn 


Thr 


Leu 


65 










70 










75 










80 


He 


Asn 


Glu 


Lys 


Trp 


Gly 


Arg 


Glu 


Glu 


He 


Thr 


Tyr 


Asp 


Thr 


Pro 


Phe 










85 










90 










95 




Lys 


Arg 


Glu 


Lys 


Ser 


Phe 


Glu 


He 


Val 


He 


Met 


Val 


Leu 


Lys 


Asp 


Lys 






100 










105 










110 






Phe 


Gin 


Met 


He 


He 


Tyr 


Cys 


Asp 


Val 


Arg 


Glu 


Phe 


Lys 


Val 


Ala 


Val 






115 










120 










12 5 








Asn 


Gly 


Val 


His 


Ser 


Leu 


Glu 


Tyr 


Lys 


His 


Arg 


Phe 


Lys 


Glu 


Leu 


Ser 




130 










135 










140 










Ser 


He 


Asp 


Thr 


Leu 


Glu 


He 


Asn 


Gly 


Asp 


He 


His 


Leu 


Leu 


Glu 


Gin 
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145 150 155 160 

Ser Phe Asn Gin Lys Ser Glu Met Lys His He Asn Lys Ala Gly Gly 

165 170 175 

Ala Thr Asp Arg Leu Pro Pro His Arg Ser Phe Arg Leu His Ser Leu 

180 185 190 

Pro Thr Gly Leu His Trp Lys Leu Asn Lys Asn Pro Gin Asn Ser Asn 

195 200 205 

Phe Leu Gly Met Pro Pro Leu 
210 215 

<210> 18 

<211> 504 

<212> DNA 

<213> Homo sapiens 

<220> 

<221> allele 
<222> 81 

<223> 99-7177-81 : polymorphic base C or T 
<220> 

<221> misc_binding 
<222> 69 . . 93 
<223> 99-7177-81. probe 

<220> 

<2 21> primer_bind 
<222> 62 . . 80 
<223> 99-7177-81 .mis 

<220> 

<221> primer_bind 
<222> 82 . .100 

<223> 99-7177-81 .mis complement 
<220> 

<221> primer_bind 
<222> 1 . . 20 
<223> 99-7177. pu 

<220> 

<221> primer_bind 
<222> 484 . .504 
<223> 99-7177. rp complement 

<400> 18 

aatcctgacc caccttctcc caagcacgca 
gaggggatca gcctactaga yggaggcagg 
gagaactcta gtagcgggga ggggaaaact 
gcaaatcagc cttaagtagg tataaaagaa 
ctcaccagac cacagaagag tcatcactgg 
gtaggagttg gggcatcccc cagcatagga 
agctcctttt attaagtcca attgttactt 
cccttcccca gcaagcaaca ctgaaacagt 
ccaggacttg agttaatttc tggg 

<210> 19 
<211> 488 
<212> DNA 
<213> Homo sapiens 

<220> 



tgtagaggaa agaaagcaag agcgatagct 60 

tgtttcaaga tggtgttgga agggcaagcc 12 0 

aaaactttat tactgtaagc aaatatcaca 180 

cccataaaag aagacaaaat gtaaccaaag 240 

agtcggaaga cagacgcgct ggatcctgca 3 00 

caacagcaac cttcaatcct ccttcgtata 360 

tgggcaccct ctgttgtttg ctggtgaggg 42 0 

ggttctggga gcagcgtcct gggacgcgtt 48 0 
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<221> 
<222> 
<223> 



allele 
345 

99-7212-346 : polymorphic base C or T 



<220> 

<221> misc_binding 

<222> 333 . .357 

<223> 99-7212 -346 .probe 



<220> 

<221> primer_bind 

<222> 326 . .344 

<223> 99-7212-346 .mis 



<220> 

<221> primer_bind 
<222> 346 . .364 

<223> 99-7212 -346 .mis complement 



<220> 

<2 21> primer_bind 
<222> 1 . . 20 
<223> 99-7212. pu 



<220> 

<221> primer_bind 

<222> 470 . .488 

<223> 99-7212. rp complement 



<400> 19 

gctccttatg taattgaatg aatggtattt ttatcagatg ctttttaaaa gtcagtacac 
aattccatct atttcacagc aaattctaca gaaatagcag ctagacagca ggaagctgtg 
gcttactgtt tagtgacttg tgattgtaat taaatgatta gtcttccact ccattccctc 
caacttgtct tgggtctggg gaggtaggga ggacaaatgc aaaatccata gagtcaagga 24 0 
tatagtgagg agtttacttt gccattgact ctgacaatca atcgtcagtg agacatgctg 300 
attgtgatga gaacatgact aaagacaaga ttccttcaag gtagygctct cacgttttca 360 
ttcaatgaaa aactattggt gttgtataac ccaatgaatc atttttgtat tttgaatctt *" n 
taaaaatata tacaagtgct attttgcttg aagtgctgtt tatttataag gttgacaatt 
aaactgac 



12 0 
180 



420 



<210> 20 

<211> 542 

<212> DNA 

<213> Homo sapiens 



<220> 

<221> allele 
<222> 226 

<223> 99-7193-228 : polymorphic base G or C 



<220> 

<2 21> misc_binding 

<222> 214 . .238 

<223> 99-7193-228 .probe 

<220> 

<221> primer_bind 

<222> 207 . . 225 

<223> 99-7193-228 .mis 



<220> 

<221> primer_bind 
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<222> 227 . .245 

<223> 99-7193-228 .mis complement 



<220> 

<221> primer_bind 
<222> 1. .20 
<223> 99-7193. pu 

<220> 

<221> primer_bind 
<222> 522 . . 542 
<223> 99-7193. rp complement 

<400> 20 

gaggtaaaaa tagcaggcag 

atgaaaactt catcatcttc 

atcaccccca aaattagtga 

agttgctcag gacttggtgg 

atacttgcag ctacattcag 

cctaacatac ctggcacctc 

tcacttctgt gtggtttctc 

tggaatgcaa aaagatgaaa 

cacttctact acacctatta 
ca 

<210> 21 
<211> 528 
<212> DNA 

<213> Homo sapiens 

<220> 

<221> allele 
<222> 212 

<223> 99-7186-212 : polymorphic base A or G 
<220> 

<221> misc_binding 

<222> 200 . .224 

<223> 99-7186-212 .probe 

<220> 

<2 21> primer_bind 

<222> 193 . .211 

<223> 99-7186-212 .mis 

<220> 

<221> primer_bind 
<222> 213 . .231 

<223> 99-7186-212 .mis complement 
<220> 

<221> primer_bind 
<222> 1 . . 19 
<223> 99-7186. pu 

<220> 

<221> primer_bind 

<222> 510 . .528 

<223> 99-7186. rp complement 

<400> 21 

gagtgccatg tgtgcataga ttgttgtctg ggttttttcc tttttgttac ttctgcaata 



gagaacagat cttttaggat tgtgaattgt aatgtggaac 60 

tgtgtgctgg ctagtgtcag ttatcctttg ctgtataaaa 12 0 

tttgaaacaa ctgtccccat ttatttactc atgattttgc 180 

ggataccttg actctgcttc tcgcastgtt gactgaggtc 24 0 

ctggcagctt cattggggct ggaacagcaa agacagcttc 30 0 

agccaaggtg gctgcaatgg ctggaggctc actgggcctt 360 

gtgatttcgt agtctatcct gaactccttt tcacggcaac 42 0 

acagaagcta caatgtctgg gaacagaagt cctagaatgt 480 

ctcactatta gtcaaaataa actcctctcc caatacttct 540 

542 
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120 
180 
240 
300 
360 
420 
480 



tttagaacag tgactgacac atatcaggca ctcaataatt atttgctgaa tttctcaatg 
tctcgatttg gcataaggat ttcattttcc catggtatat tttcttccgt ggattgatgg 
gctagtacta atttgcacgg gtgtcttggt grttcacaat catggtttta atgtcccagt 
cccctttggc tacaggaggt acttgatcct aggtgactaa ggcagaaata aatagaatgt 
gtaggactcc tctggtgtaa aaagtcatgg gttccaaaag ttcatttata agtcaattgt 
ttggacatcc tgaacttatt ttcagaacac gattgggcac agctagttaa ctgcagggag 
gcctgaggag actggaaggt gccagaacct ggaaccagat ctgcccacta ggacaggacc 
agccctggaa ggacaggagc aggtgcactg gattctaaag gtgttcag 

<210> 22 

<211> 531 

<212> DKA 

<213> Homo sapiens 

<220> 

<221> allele 
<222> 49 

<223> 99-7182-49 : polymorphic base C or T 
<220> 

<221> misc_binding 

<222> 37 . .61 

<223> 99-7182-49 .probe 

<220> 

<2 21> primer_bind 

<222> 30 . .48 

<223> 99-7182-49 .mis 

<220> 

<2 21> primer_bind 
<222> 50 . . 68 

<223> 99-7182-49 .mis complement 
<220> 

<221> primer_bind 
<222> 1. .20 
<223> 99-7182. pu 

<220> 

<221> primer_bind 

<222> 511. .531 

<223> 99-7182. rp complement 

<400> 22 

gtgtgtagaa aagaaagatg gctgtcattt gagttgttaa gaacagcayg ctgcaatacc 6 0 

aaaacatcaa gctgtacatc tcaaatatgt atgattttca tatgtgaatc acatctcaat 120 

aaagctgtta gaaaaataaa attaccatta agtttaaaaa aaaaaagaaa aaaagaaaaa 180 

aacaaccaca gtcggggcaa gggccatgtt actagggcca gggatttggc caatgaagca 240 

ggaacataga gatcctaggt ccataaggaa aagaagattc aaggaaggcc aggacatggg 3 00 

agggaatgaa caaactccag tcctagaggt ttagcagaga ctagctggct tcttgcagtg 3 60 

aattaataaa tgagaaaaaa atctgagatc acaataaaag atctttactg gtgcaagggc 42 0 

cacttctcac cgctgtttga ctgctttggg tcattcttta gtaccttaag ttttttatat 480 

tttgtgaaga ttttactatt ttttwatctg caagagagta agttcaatca a 531 

<210> 23 

<211> 546 

<212> DNA 

<213> Homo sapiens 

<220> 

<221> allele 
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<222> 372 

<223> 99-1585-373 : polymorphic base C or T 



<220> 

<221> misc_binding 

<222> 360 . . 384 

<223> 99-1585-373 .probe 



<220> 

<2 21> primer_bind 

<222> 353 . .371 

<223> 99-1585-373 .mis 



<220> 

<221> primer_bind 
<222> 373 . .391 

<223> 99-1585-373 .mis complement 



<220> 

<221> primer_bind 
<222> 1 . . 20 
<223> 99-1585. pu 



<220> 

<221> prime r_bind 

<222> 527. .546 

<22 3> 99 -15 85. rp complement 



<220> 

<221> misc_feature 
<222> 52 . . 53 , 55 
<223> n=a, g, c or t 



<400> 23 

cctgcaacat ttttwatgtg tagaattctg tgaatgaatc caacttcggc anntnttttt 
ttcttttctt ttttttaatc aaggaagtgg agacaagatg tgaaggggtg gcctgcccct 
ccacacctgt ggatatttct agtcaggtgg gacgagagac tgagaaaata aataaaacac 
agagacaaag tatagagaaa caacagtggg cccagggaac cggcgctcag cataccaagg 24 0 
acctgcaccg gcaccatctc tgagttccct cagtttttat tgattattat cttcgttatt 300 
tcagcaaaaa ggaatgtagt aggagagcag ggtgataata aggagaaggt cagcaacgaa 360 
catgtgagca ayagaatcta cgtcataatk aagttcaagg gaaggtacta tgactggacg 42 0 
tgcahgtaag ccagatttat gtttctctcc acccaaacat ctcggtggag taaagaataa 4 80 
caaggcagca ttgctgcaaa catgtctcgc ctcccgccat agggcggttt ttctcctatc 
tcagaa 



120 
180 



540 
546 



<210> 24 

<211> 396 

<212> DNA 

<213> Homo sapiens 



<220> 

<221> allele 
<222> 278 

<223> 99-1587-281 : polymorphic base A or G 



<220> 

<221> misc_binding 

<222> 266 . . 290 

<223> 99-1587-281 -probe 



<220> 

<221> primer_bind 



<222> 259. .277 

<223> 99-1587-281 .mis 



<220> 

<2 21> primer_bind 
<222> 279 . .297 

<223> 99-1587-281 .mis complement 
<220> 

<221> primer_bind 
<222> 1. .21 
<223> 99-1587. pu 

<220> 

<221> primer_bind 

<222> 377 . .396 

<223> 99-1587. rp complement 

<220> 

<221> misc_feature 
<222> 48 

<223> n=a, g, c or t 
<400> 24 

taatggtagt tgatgaggtc ctatgtaata tgcatttcct tggttgcnaa tagcaaatta 60 

ctacacacac agaaaggaaa gccacactcc ccgacacdwc tacacacagg aggactcaca 12 0 

caggagggag actcaaagaa ggcacgtgac ttttacattg ttagggctta catggtcctg 18 0 

ggatttccca ccagtactca aaagatcaat tgtatgaaca agtcacctat ttttacggca 24 0 

ctaaataatt attattcaac aacatggaaa atatgtgrta gcagacctgg attttcctta 300 

agagttattt ttatgtggta ctgccccctg ctggaatata acatctatac acatcctttc 360 

tggctgggct gacatcctaa aaccagccca ggacca 396 

<210> 25 

<211> 447 

<212> DNA 

<213> Homo sapiens 

<220> 

<221> allele 
<222> 283 

<223> 99-13798-284 : polymorphic base A or G 
<220> 

<221> misc_binding 

<222> 271. .295 

<223> 99-13798-284 .probe 

<220> 

<221> primer_bind 

<222> 264 . .282 

<223> 99-13798-284 .mis 

<220> 

<221> primer_bind 
<222> 284. .302 

<223> 99-13798-284 .mis complement 
<220> 

<221> primer_bind 

<222> 1. .20 

<223> 99-13798. pu 



<220> 

<221> primer_bind 
<222> 427 . .447 

<223> 99-13798. rp complement 
<220> 

<221> misc_feature 

<222> 34,416 

<223> n=a, g, c or t 

<400> 25 

gaggaaaagg actttggatg tctggtgtca ctgnctgcac accaggcaca cagcaggtgc 6 0 

tcaataagta tttgatgaat atatcaaatg aatgaggagt gtgacacagt tcaagaagaa 12 0 

aatcaaatga aaaattaggc ttcttagcag cccgaaaaga gctctttatc tagaaattgt 180 

caaaccagct gatgcaagtt tttttggtgt taacaaggca gccgcaagat tgctatggag 240 

aggacaccgt gtaccatgga gattaacggc atgagcttta gcrgcagcta accccgtgca 3 00 

gatgtgtgac ttggacaggt tactgagctt gctaagcccc tgtctcactc tccaaacagg 360 

gataatgaca cctctctcac aaggtggttg tgaggattaa atgaggtaat cctttnaagc 42 0 

tcccatccta gcacacgtaa gaagcat 447 

<210> 26 

<211> 506 

<212> DNA 

<213> Homo sapiens 

<220> 

<221> allele 
<222> 402 

<223> 99-1601-402 : polymorphic base A or T 
<220> 

<221> misc_binding 

<222> 390 . .414 

<223> 99-1601-402 .probe 

<220> 

<2 21> primer_bind 

<222> 383 . .401 

<223> 99-1601-402 .mis 

<220> 

<221> primer_bind 

<222> 403 . .421 

<223> 99-1601-402 .mis complement 
<220> 

<221> primer_bind 

<222> 1. . 18 

<223> 99-1601. pu 

<220> 

<221> primer_bind 

<222> 486. .506 

<223> 99-1601. rp complement 

<400> 26 

ttggcttggc agggcaacca gctcaccaga ctctctgcag acccgaagtc attacataca 60 

gtatgataac agggaatgga cccgaccagc atttgctgga gatgatatct ggtgtcagcc 12 0 

cgacaggccc ctacctgctt ctcttgatat gcaggaatcc cttcaagctc caacaagatc 180 

tgtttaatag actggagagt cctttagttc cttcctctaa gggaaaatca gatcgttctg 24 0 

gtttgcttgg taactcctta cttcatccct gatgggaagt ttatagaatg aggaaccagg 300 

gctattacat gaaactataa aactgcctag agcacatact tggtattttt aacattgttg 360 
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agagggactc acttaattca gccttgcagc tattgcattc cwgtccaaac caacggcagg 
ttctcaaaac aagcggtgaa agggttcctg ttgcagagct gtctggacat ttaaagaagg 
gagaggaaat ctcarggggt cggttg 

<210> 27 

<211> 546 

<212> DNA 

<213> Homo sapiens 

<220> 

<221> allele 
<222> 79 

<223> 99-13808-80 : polymorphic base A or T 
<220> 

<221> misc_binding 

<222> 67 . .91 

<223> 99-13808-80 .probe 

<220> 

<221> primer_bind 

<222> 60 . . 78 

<223> 99-13808-80 .mis 

<220> 

<221> primer_bind 

<222> 80 . . 98 

<223> 99-13808-80 .mis complement 
<220> 

<221> primer_bind 

<222> 1. .20 

<223> 99-13808. pu 

<220> 

<221> primer_bind 
<222> 526. .546 

<223> 99-13808. rp complement 
<220> 

<221> allele 
<222> 266 

<223> 99-13808-268 : polymorphic base A or C 
<220> 

<221> misc_binding 

<222> 254 . .278 

<223> 99-13808-268 .probe 

<220> 

<221> primer_bind 

<222> 247 . . 265 

<223> 99-13808-268 .mis 

<220> 

<221> primer_bind 
<222> 267. .285 

<223> 99-13808-268 .mis complement 
<220> 

<221> allele 
<222> 419 
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<223> 99-13808-425 : polymorphic base G or C 



<220> 

<221> misc_binding 

<222> 407 . .431 

<223> 99-13808-425 .probe 



<220> 

<221> primer_bind 

<222> 400 . .418 

<223> 99-13808-425 .mis 



<220> 

<221> primer_bind 

<222> 420 . .438 

<223> 99-13808-425 .mis complement 



<220> 

<221> allele 
<222> 453 

<223> 99-13808-455 : polymorphic base A or G 



<220> 

<221> misc_binding 

<222> 441. .465 

<223> 99-13808-455 .probe 



<220> 

<221> primer_bind 

<222> 434 . .452 

<223> 99-13808-455 .mis 



<220> 

<221> primer_bind 
<222> 454 . .472 

<223> 99-13808-455 .mis complement 



<400> 27 

gttgtgcctt aaagaatttg ctcatccaca gagtgccaac tgcattagaa agaaaacaac 
tctcctttct aactcaccwg cattgatttt ctgttgttgg catgtagaag agtatttcaa 
agaatgaatg aaagctataa tatttattag aagtaaaaaa gttctaaaga tatgctacct 
tactgggatg cttagagacc atttgcaaac cctgtttatg atctagaaat cctgtttttc 

attttttatt tgtaaaactc tataaratctc aaaaaatttt aggtggatta tcatgtacct 3 00 

aagggtaaaa tatagttgaa attattctta cctgattttt catatctgaa tttcgtgggc 3 60 

agttcaaagt aattgtatca cattcttcag ctaggaaaaa aaaaaagaaa gaaagaaasa 420 

aacaaagtgt gattttaaaa agcacacact ccrtggtgta agacctaaaa ttaaggttca 480 

gtgtcacatg ctgccttggc atctggtaaa atcagaagag ctggactaca aatycctctc 54 0 



240 



caaact 



546 



<210> 28 
<211> 476 
<212> DNA 
<213> Homo 



sapiens 



<220> 

<221> allele 
<222> 212 

<223> 99-13810-214 : polymorphic base C or T 



<220> 

<221> misc_binding 
<222> 200 . . 224 
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<223> 99-13810-214 .probe 



<220> 

<221> primer_bind 

<222> 193 . .211 

<223> 99-13810-214. mis 

<220> 

<221> primer_bind 
<222> 213 . .231 

<223> 99-13810-214 .mis complement 
<220> 

<221> primer_bind 

<222> 1 . . 18 

<223> 99-13810. pu 

<220> 

<221> primer_bind 
<222> 458 . . 476 

<223> 99-13810. rp complement 
<220> 

<221> allele 
<222> 168 

<223> 99-13810-170 : polymorphic base A or T 
<220> 

<221> misc_binding 

<222> 156 . . 180 

<223> 99-13810-170 .probe 

<220> 

<221> primer_bind 

<222> 149. . 167 

<223> 99-13810-170 .mis 

<220> 

<221> primer_bind 
<222> 169 . . 187 

<223> 99-13810-170 .mis complement 
<400> 28 

gcattcccag attgtaacat agttttaagt aaacatccac tgaaagtctg catggaagaa 60 
cacagaagcc agagcaagtt cagggctcct agaaagacga tgctggagct agccctagag 12 0 
aatggctgag aattggatga actcagaaga agcagcaaag tagttgcwgg tggcaggcat 180 
ggcaggagaa gggatcaggt ggctggaaga gyggagggta tagaactgaa acagagagtc 240 
tgttggaggt ggacagagga aggcgggatt agatgagaaa tgacggaccc agtttctaag 3 00 
aaagaccaag aaagataagc aaagggattt aggtgggatg cccttctagg ttctcgggaa 360 
acttgctacc tgccttgcac tgactttgca tgagggaaga tggtcaacac agtcttgcaa 420 
gaagtcagac aagcaggcaa tgacaattct ctgagatggc aaatagggat tgggct 476 

<210> 29 
<211> 454 
<212> DNA 

<213> Homo sapiens 
<220> 

<221> allele 
<222> 127 

<223> 99-13790-129 : polymorphic base C or T 
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<220> 

<221> misc_binding 

<222> 115- -139 

<223> 99-13790-129 .probe 



<220> 

<221> primer_bind 

<222> 108 . . 126 

<223> 99-13790-129 .mis 



<220> 

<221> primer_bind 
<222> 128. .146 

<223> 99-13790-129 .mis complement 



<220> 

<221> primer_bind 

<222> 1. .20 

<223> 99-13790. pu 



<220> 

<221> primer_bind 
<222> 434 . .454 

<223> 99- 13 790. rp complement 



<400> 29 

gtcattttac taagcctttc agacagtaga gagtgggatt atacttgtcc caacagctca 60 

ccctcctaaa ggtcaaacct aaaccatttt ggttctcttg ttcaagttca ggttgccagt 120 

gaaaagyaaa ggaacttgaa attcatgtta aacatttaac atctttccat atgaattgct 180 

aggaagcaac ttccattcca aagttgtgtt aacttcacag ttttcccacc tgtggtgaag 240 

atggtacaaa atagcttaaa aactgatttt gttccatcag attctaatct ttagtcacag 3 00 

aattcaaggc catactctaa actttaaggt tggcagaaat atattataac agaaatttta 3 60 

gcaccatgta aatgtttaaa gttatttagc cttaaataca gaaccattta actcagggtt 420 

gaaaagtcag gatgaagtga gggwttgatt gatt 4 54 

<210> 30 

<211> 444 

<212> DNA 

<213> Homo sapiens 



<220> 

<221> allele 
<222> 153 

<223> 99-13809-153 : polymorphic base A or G 



<220> 

<221> misc_binding 

<222> 141. .165 

<223> 99-13809-153 .probe 



<220> 

<221> primer_bind 

<222> 134 . . 152 

<223> 99-13809-153 -mis 



<220> 

<221> primer_bind 
<222> 154 . . 172 

<223> 99-13809-153 .mis complement 



<220> 

<2 21> primer_bind 
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<222> 1. .21 

<223> 99-13809. pu 



<220> 

<221> primer_bind 
<222> 424 . .444 

<223> 99-13809. rp complement 



caactgagtg aagagcaatg ggaatttgta gactttacag atgacatcac ccccatcata 
cacgatgaag ctcagcagac agttgctgct ttccatccct taaccaggat atccctgata 
aaggaaggac ccaagattag caaaactggc caracttcag gcagtcatct tattgctgga 
tgtcctggcc aacaaatcgc cccatctgca cagtttttat aaatttttgg accattgcct 
aagagttgca ccctttgtgg taaagaactc tcagaatctc ttgcctcaaa tacacccaaa 
ctataaataa agaaacagat gtctctatgt acagcaaggc caccatacaa ggcttcagca 
gaacatttcc agtctccttt ggagtcccac ttattactga cagtgagcaa gacactcatt 
tctcttctaa gaacatacaa cgcc 



240 
300 
360 



<210> 31 

<211> 693 

<212> DNA 

<213> Homo sapiens 



<220> 

<221> allele 
<222> 162 

<223> 99-1597-162 : polymorphic base A or G 



<220> 

<221> misc_binding 

<222> 150 . . 174 

<223> 99-1597-162 .probe 



<220> 

<221> primer_bind 

<222> 143 . . 161 

<223> 99-1597-162 .mis 



<220> 

<221> primer_bind 
<222> 163 . .181 

<223> 99-1597-162 .mis complement 



<220> 

<221> primer_bind 
<222> 1. .19 
<223> 99-1597. pu 



<220> 

<221> primer_bind 

<222> 675 . . 693 

<223> 99-1597. rp complement 



<220> 

<221> misc_feature 

<222> 582 . .615 

<223> n=a, g, c or t 



<400> 31 

tttaagatgc caagttgtca aactgggcag 
gactagcaaa gccgagtcat ccccctgctc 
aaagcacaat agatggggcc ctggtctctg 



ctaggcccca ggctctttct aaattgtcaa 
tagttctgga tgacaccaag cctaggaaat 
aatgacagag trtgcatggg ggctaggagg 
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240 
300 
360 
420 
480 
540 
600 
660 
693 



actatgtatt gggtggctac tttagcatca ate 

<210> 32 
<211> 26 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> oligonucleotide BAP283Ra6283 
<400> 32 

ggeggatgae tctctttgga aaccac 

<210> 33 
<211> 25 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> oligonucleotide BAP283Ra6324n 
<400> 33 

tgetaaagae gagagactcc tegee 

<210> 34 
<211> 29 
<212> DNA 

<213> Artificial Sequence 



<220> 

<223> oligonucleotide BAP28-exALF7311 
<400> 34 

ccccctatga tctgattcac caggcttac 

<210> 35 
<211> 26 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> oligonucleotide BAP2 8-exALF7319n 
<400> 35 

gatctgattc accaggctta cctccg 

<210> 36 
<211> 27 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> oligonucleotide PCTAexALF12 
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<400> 36 

cccacctgaa aagactggat ttgggac 



27 



<210> 37 
<211> 27 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> oligonucleotide PCTAexALF13n 
<400> 37 

ccacctgaaa agactggatt tgggaca 

<210> 38 
<211> 27 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> oligonucleotide P CTAexALR 6 0 
<400> 38 

ctccgaagtt ctacaggatc atgtccc 

<210> 39 
<211> 27 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> oligonucleotide PCTAexALR12n 
<400> 39 

cccaaatcca gtcttttcag gtgggag 

<210> 40 
<211> 25 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> oligonucleotide PCTAexBLF3 3 
<400> 40 

gaaaaccctg gcgttgaccc cgtgg 

<210> 41 
<211> 25 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> oligonucleotide PCTAexBLF12 On 
<400> 41 

caccctggtc ctgaaaagtc cagcc 

<210> 42 
<211> 25 
<212> DNA 

<213> Artificial Sequence 
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<220> 

<223> oligonucleotide PCTAexBLR14 0 
<400> 42 

taggagggag ggagattcgg ggctg 

<210> 43 
<211> 26 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> oligonucleotide PCTAexBLR4 On 
<400> 43 

tccacggggt caacgccagg gttttc 

<210> 44 
<211> 27 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> oligonucleotide PCTA5Ra220n 
<400> 44 

gggaatggtg ccaacatacg ggattac 

<210> 45 
<211> 26 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> oligonucleotide PCTA5Ra230 
<400> 45 

gctgatcggg aatggtgcca acatac 

<210> 46 
<211> 25 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> oligonucleotide PCTA_5Ra400 
<400> 46 

tcacctacac ctcacccact tcctc 

<210> 47 
<211> 25 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> oligonucleotide PCTA_5Ran_4 



<400> 47 

cctacacctc acccacttcc tcccc 



<212> DNA 

<213> Artificial Sequence 
<220> 

<223> oligonucleotide PCTA_5Ra_394 
<400> 48 

cctccccacc actggttcat tttgaag 

<210> 49 
<211> 25 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> oligonucleotide PCTA_exD5Ra 
<400> 49 

tccccaggaa gtgtcatgga cggtg 

<210> 50 
<211> 27 
<212> DNA 

<213> Artificial Sequence 
<220> 

<22 3> oligonucleotide PCTA_exD5Ran 
<400> 50 

ggaagtgtca tggacggtgt cttctac 

<210> 51 
<211> 29 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> oligonucleotide PCTA_exC5Ra 
<400> 51 

gctttcatcg cccagctcca gcaattatc 

<210> 52 
<211> 25 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> oligonucleotide PCTA_exC5Ran 
<400> 52 

tcgcccagct ccagcaatta tcgac 

<210> 53 
<211> 25 
<212> DNA 

<213> Artificial Sequence 



<220> 

<223> oligonucleotide PCTAex9terLR33 0 



<400> 53 

acatggggag ctgtgacagt tctgg 

<210> 54 
<211> 29 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> oligonucleotide PCTAex9terLR325n 
<400> 54 

ggggagctgt gacagttctg gaacataag 

<210> 55 
<211> 27 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> oligonucleotide PCTAexCLF120 
<400> 55 

cttcaaaatg aaccagtggt ggggagg 

<210> 56 
<211> 25 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> oligonucleotide PCTAexCLF130n 
<400> 56 

accagtggtg gggaggaagt gggtg 

<210> 57 
<211> 20 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> oligonucleotide BAP2 8polyTcourt 
<400> 57 

tttttttttt tttttgtata 

<210> 58 
<211> 25 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> oligonucleotide BAP2 81LF12.1 
<400> 58 

ccatgtggga agcgctgtga agagt 

<210> 59 
<211> 26 
<212> DNA 
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<213> Artificial Sequence 
<220> 

<223> oligonucleotide BAP28LR6726 . l 
<400> 59 

cagctctata cgataggcag gagagg 

<210> 60 
<211> 38 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> oligonucleotide BAP2 8LF26SalI 
<400> 60 

cctgtgtcga ccgctgtgaa gagttgttgc cttccaag 

<210> 61 
<211> 36 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> oligonucleotide BAP28LR6717SalI 
<400> 61 

actccgtcga ccgataggca ggagaggctt atgtgg 

<210> 62 
<211> 18 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> sequencing oligonucleotide PrimerPU 
<400> 62 

tgtaaaacga cggccagt 

<210> 63 
<211> 18 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> sequencing oligonucleotide PrimerRP 
<400> 63 

caggaaacag ctatgacc 
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